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Abstract — In this paper, we propose methods of handling 
attributive values of object classes in object oriented database 
with fuzzy information and uncertainty based on quantitatively 
semantics based hedge algebraic. In this approach we consider to 
attributive values (as well as methods) object class is interval 
values and the interval values are converted into sub interval 
in [0, 1] respectively. That its the fuzziness of the elements in the 
hedge algebra is also sub interval in [0,1]. So, we present an 
algorithm allows the comparison of two sub interval [0,1] helping 
the requirements of the query data. 



I. 



Introduction 



In recent years, the information about the objects in the 
real world are often fuzziness, uncertain, incomplete. So the 
traditional object-oriented database model inconsistent in 
reality. Solving this problem, fuzzy object-oriented database 
modeling has suggested to represent and process the objects 
that the information its can be fuzziness and uncertainty. 

The attributive value of the object in the fuzzy object- 
oriented database is complex. It includes: linguistic values, 
number values, interval values, reference to objects (this 
object may be fuzzy), collections,... Thus, when query data in 
object-oriented database with fuzzy and uncertainty ty 
information the most important problems is how to find a 
method of handle the fuzzy values and then we build a 
methods comparising them. There are many approaches on 
handling fuzzy values that researchers interests as: graph 
theory [4], fuzzy logic and theory of ability [2], probability 
theory [3], logical basis [1],... Each approach has advantages 
and disadvantages. 

In 2006, Nguyen Cat Ho and al have proposed an hedge 
algebraic model. Approached in hedge algebra, linguistic 
semantics can be represented by an neighborhood intervals 
defined by the fuzzy measure and linguistic values of attribute 
it considered as linguistic variable. On this basis, in this paper 
considered domain of fuzzy attribute is hedge algebra and 
transformer interval values into subsegment [0, 1], and then 
querying and handling the data of objescts with fuzzy 
information and uncertainty become effective. 

The paper is organized as follows: Section 2 presents the 
basic concepts relevant to hedge algebraic as the basis for the 
next sections; section 3 proposed two SFTVA and SFTVM 
algorithms for searching data fuzzy conditions for both 



attributes and methods; section 4 presents examples for 
seraching data with fuzzy information, and finally conclusion. 

II. Hedge Algebras 

Builting on approach to hedge algebra, we present some 
overview of basics of hedge algebra and the ability to 
represent the semantics based on the structure of hedge 
algebra [6]. 

Consider the domain of the linguistic variable Truth: 
Dom(TRXJTH) = {true, false, very true, very false, more-or-less 
true, more-or-less false, possibly true, possibly false, 
approximately true, approximately false, little true, little false, 

very possibly true, very possibly false }, where true, false is 

primary terms, mordifier terms very, more-or-less, possibly, 
approximately true, little is hedges. Meanwhile linguistic 
domain T = Dom(TRXJTH) can be considered as a linear hedge 
algebra X = ( X, C, H, < ), where C is a set of primary term 
considered as a generator term. H is a set of hedge considered 
as a one-argument operations, < relation on terms (fuzzy 
concepts) is a relation order "induced" from natural semantics. 
Example based on semantics, relation order following are true: 
false < true, more true <very true nhung very false < more 
false, possibly true <true nhung false < possibly false, ... Set X 
is generated from C by means of one-argument operations in H. 

Thus, a term of X represented as x = h n h n .j h]X, x e C. Set 

of terms is generated from the an X term denoted by H(x). If C 
has exactly two fuzzy primary term, then one term called 
positive term denoted by c + , other term called negative denoted 
by c" and we have c" < c + . In the above example, True is 
positive and False is negative. 

Thus, let X = ( X, G, H, <) with G = { c~ W, c + }, H = H~ 
u H + , where H + = {h ly ..., h p } and K - {h.j, ..., h. q } are 
linearly ordered, with hi < .. .< h p and h.j< .. .< h. q , where 
p, q >7, we have the following definitions related: 
Definition 2.1 [6]. f: X —> [0,1] is quantitative semantic 
function of X if Vh, k e H + or V h, k e H", Vx, y e X, we 
have: 

\f(hx)-f(x)\Jf(hy)-f(y)\ 
\f(kx)-f(x)\ \f(ky)-f(y)\ 

For hedge algebra and quantitative semantic function, we 
can define fuzziness of fuzzy concept. Given quantitative 
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semantic function f of X, consider any x e X. Fuzziness of x 

when it is measured by the diameter of the set f(H(x)) e 

[0,1]. 

Definition 2.2 [6]: An fin : X — » [0,1] is said to be a fuzziness 

measure of terms in X if: 

(1) fm is called complete, that is VueX 

, ^fin(h l u) = fin(u). 

-q<i<p,i*0 

(2) if x is precise, that is H(x) = {x} then fin(x) = 0. Hence 
frn(0)=fin(W)=fin(l)=0. 



(3) Vx,y g X, Vh 



H, 



fin(hx) fm{hy) 



This 



fm(x) frn(y) 
proportion is called the fuzziness measure of the hedge h and 
denoted by ju(h). 

Definition 2.3 [6]: Invoke fm is fuzziness measure of hedge 
algebra X, fi X -> [0, 1]. Vxg X, denoted by I(x) c [0, 1] 
and II(x)l is measure length of I(x). 

A family J = {I(x):xeX} called the partition of [0, 1] if: 

(1): {I(c + ), 1(c)} is partition of [0, 1] so that II(c)l = 
fm(c), where ce {c + , c~}. 

(2): If I(x) defined and II(x)l = fm(x) then {I(hix): I = 
l...p+q} is defined as a partition of I(x) so that satisfy 
conditions: II(hix)l =/ra(hix) and II(hix)l is linear ordering. 

Set {I(hix)} called the partition associated with the terms 
x. We have 

J^\l(h i x)\ = \l(x)\ = frn(x) 

i=\ 

Definition 2.4 [6]: Set X k = \XE X : |x| = k}, consider P k = 

\I(x) : X E X k j is a partition of [0, 1]. Its said that u equal v 

at k level, denoted by u = k v, if and only if I(u) and I(v) 
together included in fuzzy interval k level. Denote Vu, v e X, 

u= k v^3A k e P k :I(u)czA k and /(v)cA* . 



III. Fuzzy object-oriented database and data search 

METHOD 

Based on fuzzy object-oriented database model given by 
Zongmin Ma[ll], fuzzy class C includes a set of attributes and 
methods. 

C = ({ai,a 2 ,...,a k },{M 1 ,M 2 ,...,M m }) 
Where ai is imprecise attribute (precise), Mj is method. 
Attribute a t = <n, t> with n is name and t is value 
attribute. Attribute value can be one of the four following 
cases: 

• Precise values: This category of values involves 
all the primary values that usually appear in an 
object-oriented data model (e.g., numeric classes, 
string classes, etc.). Domain value in this case we 
can easily manipulate with the use of the 
operations (<,>,= ) in the conditional 

expression of queries; or we can build the fuzzy 
conditions fuzzy to implement query data, 
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example "show all objects employees who is low 
income than the average salary ". 

• Imprecise values (or fuzzy): The cases with 
imprecise values (or fuzzy) are complex, 
linguistic labels [10] are usually used to 
represent this kind of values. Different types of 
imprecise values must be considered on the 
semantics of the imprecise value. For example, a 
plant is named thyme, it developer on humus 
land biet the levels of low or average lighting is 
uncertainly; or His height is about 2 meters; 
approximately [18, 35] to represent young 
people's concepts. 

• Objects: The attribute value may be a reference 
to another objects (complex object). Objects that 
it references may be fuzzy. 

• Collections: The attribute may be conformed by 
a set of values or even by a set of objects. 
Imprecision in this kind of attributes appears at 
two levels: 

o The set may be fuzzy, 
o The elements of the set may be fuzzy 
values or fuzzy objects. 
A method defined in class is as following description: 
Mj(N,I,R)^(u,v,g) 
Where: 

N: name method. 

I: set of input parameters; {<name, type>}. 
R: set of attributes that its value is read by the 
method. 

u: set of output parameters include the return value 
type {<name, type>}. 

v: set of attributes that its value is changed by the 
method. 

g: the set of message given by the method of the form 
{[o, msg, p]}, o is the place to receive notifications, msg is 
message and p is the set of parameters in the message {<n, 
t>}. 

Similar the model of object-oriented database, a fuzzy 
object oriented database is data model, in which attribute of 
data is fuzzy (or clear) and methods operate on the attributes 
that are packaged in structures called objects (fuzzy). 



A. Convert the attribute value to interval values 

In this paper, we only interested in handling of interval 
values. So, all attribute values are transferred to interval value 
and then manipulating easily. The description of transferable 
method follows as: 

- If attribute value is a then converted into [a, a] . 

- If attribute value is about a then converted into [a- £ , 
a+ £ ], £ is the radius with center x. 

- If attribute value from a to b then converted into [a, b] . 

B. Convert the interval values to subsegment [0, 1] 

Set Dom(At) = [min, max] is domain object attribute 
values, where min and max stand for min and max values of 
Dom(Ai). 



Definition 3.1 [9]: /• Dom(Ai) — > [0, 1] and determined: 

f(a) = \/ae Dom(A t ) 

max-min 

C. Algorithm search data approach to interval value 

The query language model object-oriented databases are 
several authors research interest and extend the model fuzzy 
object-oriented database. The structure of fuzzy OQL queries 
are considered as: select <attributes>/<methods> from 
<class> where <fc>, where <fc> are fuzzy conditions or 
combination of fuzzy condition that allow using of disjunction 
or conjunction operations. 

Important issues in the fuzzy OQL query is determine 
truth value of the <fc> and associated truth values. In this 
paper, we use approaching to interval values for 
determinating the truth value. Example, we consider query 
following "show all students are possibly young age". To 
answer this query, we perform finding the intersection parts of 
two subsegment [0, 1]: 

+ First subsegment: As we have shown the attribute value 
has 4 cases, we focus on considering the attribute values in the 
second case and special interval value. In the above query, age 
is attribute of student objects and attribute value are 
considered interval value. We use definition 3.1 to convert this 
interval into the subsegment [0, 1]. 

+ Second subsegment. In the above query, possibly young 
is fuzzy condition and fuzzy condition is considered fuzziness 
on complete linear hedge algebra. So, fuzzy condition is also 
subsegment [0, 1] (fuzziness of linear hedge algebra is 
subsegment [0, 1]). 

Without loss of generality, we consider on cases multiple 
fuzzy conditions with notation follow as: 

- 6 is AND or OR operation. 

- fz value i is fuzzy values of the i attribute. 

SFTVA algorithm: search data in cases multiple fuzzy 
conditions for attribute with 6 operation. 
Input: A class C consists of a set of attributes and methods. 
C = {Oili=l..n}. 

oi=<{ai, a 2 , .., a p },M>. 

where ai is attribute, M is set methods. 

p k 

Output: V o g C satisfy condition (o.a t = fz value . ) 

(where o.a t is attribute value i of object). 

Method 

Initialization. 

(l)Fori= 1 top do 

(2) Begin 

(3) Set G a = { 0, c; , W, < , /}, H a = H\ u H a . 

Where H a = {hj,h 2 }, H a = {h 3 , h 4 }, with hj < h 2 and h 3 > 

h 4 . Select the fuzzy measure for the generating element and 
hedge. 

(4) D a = [min fl ,max a ] // min a , max a : min and max 

value of domain a,. 
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(5)End 

(6) For each o e C do 

(7) For i = 1 to p do 

(8) Convert o.a t into interval [a t , b t ] respective; 
// used function f to convert interval [a, b] into subsegment [0, 

i] 

(9) For each object o e C do 

(10) Fori=ltopdo 

(11) o.a, = [/(a t ), /(b t )]; 

// Construct fuzzy measure I a (x . ) keep partition k level. 

(12) £=1; 

(13) While k < 4 do // level partition largest with k = 4 

(14) Begin 

(15) Fori=ltopdo 

(16) Forj = lto 2 5 (fc-l)do 



(17) Construct fuzzy measure k level: I a (x , ) ; 

(18) k = k+l; 

(19) End 

//Determine partition k level of fz value i 

(20)Fori=l top do 

(21) Begin 

(22) t=0; 

(23) Repeat 

(24) t=t+l; 

(25) Until fz k value i e I k a .(x t ) ; 

(26) X*=X*U/J(^); 

(27) End 

(28) For each oe C do 



(29) 



If (o.a t cz Xf) 



then 



(o.a t = X. ); 



SFTVM algorithm: search data cases single fuzzy conditions 
for method. 

In the object-oriented database model, class is defined as a 
set of characteristics, including attributes and methods 
determine objects of this class. Each method is performed as a 
function operation on attribute values of objects. So, finding 
the data in this case, we convert interval values of attribute 
which handling on it with the corresponding domain into 
subsegment [0, 1], corresponder. Further, we choose the 
function combination of hedge algebras that are consistent 
with method that its operation. Then, domain of method is 
subsegment [0, 1]. 

At last, we perform finding the intersection parts of two 
subsegment [0, 1] this. 
Input: A class C consists of a set of attributes and methods. 

C={Oili=l..n}. 

Oi=<{ai, a 2 , ..., a p }, {Mi, M 2 , ..., M m }>. 

where ai is attribute, Mj is method. 

Output: V o g C satisfy condition o.M t = fzp value (o.M t 

is the return value of method). 

Method 

Initialization. 



max„ : min and 



(1) Fori= 1 top do 

(2) D a =[min fli ,max fl J // min 

max value of domain a;. 

(3) For each object o e C do 

(4) For i = 1 to p do 

(5) Convert o.a t into interval [a t , b t ] respective; 
// used function f to convert interval [a, b] into sub segment [0, 
1] 

(6) For each object o e C do 

(7) For i = 1 to p do 

(8) o.Oi = [/(a t ), /(b t )]; 

(9) Determine function combination of hedge algebras 
// Determine domain for method 
(10)Fori= 1 torn do 
(11) oM x =\f(xhf(lJ\\ 
(12)Fori= 1 torn do 

(13) Set G h = { 0, c" , W, c 



Where H + h 



>1) 
hd 



H u 



K u HZ . 



with hi < h 2 and h 3 



{hi, h 2 }, H h = {h 3 , 

> h 4 . Select the fuzzy measure for the generating element and 
hedge. 

jk 

II Construct fuzzy measure l h keep partition k level. 

(14) £=1; 

(15) While k < 4 do // level partition largest with k = 4 

(16) Begin 

(17) Fori= 1 torn do 

(18) Forj = lto 2 5 (k-l) do 

(19) Construct fuzzy measure k level: L (x . ) ; 

(20) k = k+l\ 

(21) End 

// Determine partition k level of fvalue 
(22)Fori= 1 to m do 

(23) Begin 

(24) t=0; 
Repeat 

t=t+l; 

Until fzpvalue e I h (x t ) ; 

(29) End 

(30) For each oeCdo 

(31) Fori=ltomdo 

(32) If 
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Indeed, to find the intersection of the two subsegments [0, 
1], with [I a , I b ] is the first subsegment and [I x i, I x2 ] is the 
second subsegment. We have the following cases: 

First case: If [I a , I b ] H [I xl , I x2 ] = then [I a , I b ] <t [I x i, I x2 ]. 
Second case: If [I a , I b ] D [I x i, I x2 ] ^ then three cases 
occurred following: 

a. If I xi <= I a and I b <= I x2 then [I a , I b ] c [I xb I x2 ]. 

b. If I a < I xi and l xl < I b <= I x2 then [I a , I b ] <t [I xl , I x2 ]. 

c. If I xi <= I a < I x2 and I b > I x2 then[I a , I b ] <t [I xl , I x2 ]. 
Algorithm is always check subsegment [I a , I b ] contained 

in subsegment [I xl , I x2 ]. 

Computational complexity of SFTVA algorithm 
evaluation follows as: step (l)-(5) complexity is O(p), step (6)- 
(8) is 0(n*p), step (9)-(ll) is 0(n*p), step (12)-(19) is 0(p), 
(step (20)-(27) is 0(p), step (28)-(29) is 0(n*p). So, the 
SFTVA algorithm can computational complexity 0(n*p). 

Computational complexity of SFTVM algorithm 
evaluation follows as: step (l)-(2) complexity is O(p); step 
(3)-(5) is <9(n*p); step (6)-(8) is <9(n*p); step (10)-(11) is 
O(m); step (12)-(13) is O(m); step (14)-(21) is O(m); step 
(22)-(29) is 0(m); step (30)-(31) is 0(n*m). So, the SFTVM 
algorithm can computational complexity is max(0(n*p), 
0(n*m)). 



(25) 
(26) 

(27) 
(28) 



(o.M t cz Y t k ) then (o.M t = }f ); 



Theorem: SFTVA algorithm and SFTVM algorithm always 

stop and correct. 

Proof: 

1. The Stationarity: Algorithm will stop when all objects 
completed the approved 

2. The corrective maintenance: algorithm always checks the 
two subsegments are intersecting or not. 



IV. Example 

we consider a database with six rectangular object as 
follows: 



rectangular 


iDhcn 


name 


length of 
edges 


width of 
edges 


area() 


iDl 


hcnl 


[1.65, 1.68] 


[1.3, 1.4] 




iD2 


hcn2 


1.72 


[1.48, 1.5] 




iD3 


hcn3 


[1.7, 1.75] 


1.72 




iD4 


hcn4 


1.67 


[1.2, 1.3] 




iD5 


hcn5 


[1.2, 1.3] 


1.4 




iD6 


hcn6 


1.6 


[1.36, 1.48] 





Query 1: List of rectangles have length "less long" and width 

"possibly short". 

To answer queries 1 we do the following: 

Step (l)-(5): 

Let consider a linear hedge algebra of length, Xi ength = ( 
X, 



length 



Giength H-iength ^), where Gi ength = {S, LJ, with S, L stand 



for short and long, H + i ength = {M, V}, H'i ength = {P, LJ, where P, 
L, M and V stand for Possibly, Little, More and Very. 

Suppose that Wi ength = 0.6, fm(short) = 0.6, fm(long) = 0.4, 
fin(V) = 0.35, fm(M) = 0.25, ftn(P) = 0.2, fm(L) = 0.2. 

Dom(LENGTH) = [1.0, 2.0]. 
Step (6)-(ll): 



rectangular 


iDhcn 


name 


length of edges 


width of edges 


areaQ 


iDl 


hcnl 


[0.65, 0.68] 


[0.3, 0.4] 




iD2 


hcn2 


[0.72, 0.72] 


[0.48, 0.5] 




iD3 


hcn3 


[0.7, 0.75] 


[0.72, 0.72] 




iD4 


hcn4 


[0.67, 0.67] 


[0.12,0.13] 




iD5 


hcn5 


[0.12,0.13] 


[0.12,0.12] 




iD6 


hcn6 


[0.6, 0.6] 


[0.38, 0.48] 





Step (12)-(19): so less long and possibly short at two levels of 
partitioning, we only built two levels of partitioning. 

We have fm(VL) = 0.14, fm(ML) = 0.1, fm(LL) = 0.08, 
fm(PL) =0.08. 

By LL < PL < L < ML < VL so we have I(VL) = [0.86, 1], 
I(ML) = [0.76, 0.86], I(PL) = [0.68, 0.76], I(LL) = [0.60, 
0.68]. 

We have fm(VS) = 0.21, fm(MS) = 0.15, fm(LL) = 0.12, 
fm(PS) = 0.12. 

By VS < MS < S < PS < LS so we have I(VS) = [0, 0.21], 
I(MS) = [0.21, 0.36], I(PS) = [0.36, 0.48], I(LS) = [0.48, 0.6]. 
Step (20)-(27): determine the partitioning of less long and 
possibly short. 

X k = I(LL) = [0.60, 0.68] and Y k = l(PS) = [0.36, 0.48]. 
Step (28) -(29): according to conditions: 

• The length is 'Hess long" so we have three 
objects satisfied is iDl, iD4, iD6. 

• The width is "possibly shorf so we have three 
objects satisfied is iDl, iD6. 

So there are two objects iDl, iD6 satisfies a query with 
the operation and. 

Query 2: List of rectangles have area is "less small". 

To answer queries 2 we do the following: 

Step (l)-(2): Dom(LENGTH) = [1.0, 2.0]. 

Step (9): Method calculates the area of a rectangle is length x 

width so in this case we select the function combined hedge 

algebra functions as follows: 

f(x) =/(a0 x /(a 2 ) 
f(y) =/(bi) x f(b 2 ) 
Where:- f(x), f(y) is lower and upper bound of the domain 
method area(). 

- /(ai), /(a 2 ), /(bi), f(b 2 ) is lower and upper bound of 
length and width attribute. 
Step (3)-(8), (10)-(11): 
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We have fm(VS) = 0.21, fm(MS) = 0.15, fm(LL) = 0.12, 
fm(PL) = 0.12. 

By VS < MS < S < PS < LS so we have I(VS) =[0, 0.21], 
I(MS) = [0.21, 0.36], I(PS) = [0.36, 0.48], I(LS) = [0.48, 0.6]. 
Step (22)-(29): determine the partitioning of less small. 

X k = l(LS) = [0.48, 0.60]. 
Step (30)-(31): according to conditions, rectangular area is 
less small so there is a satisfying object ID3. 



rectan 


gular 


iDhcn 


name 


length of 
edges 


width of 
edges 


area() 


iDl 


hcnl 


[0.65, 0.68] 


[0.3, 0.4] 


[0.2, 0.27] 


iD2 


hcn2 


[0.72, 0.72] 


[0.48, 0.5] 


[0.35, 0.36] 


iD3 


hcn3 


[0.7, 0.75] 


[0.72, 0.72] 


[0.5, 0.54] 


iD4 


hcn4 


[0.67, 0.67] 


[0.12,0.13] 


[0.08, 0.09] 


iD5 


hcn5 


[0.12,0.13] 


[0.12,0.12] 


[0.01,0.02] 


iD6 


hcn6 


[0.6, 0.6] 


[0.38, 0.48] 


[0.23, 0.29] 



Step (12)-(13): 

Let us consider a linear hedge algebra of size, X size = ( 
Xsize, G size , H size , <), where G size = {S, L}, with S and L stand 
for small and large, H + size = {M, V}, H" size = {P, L], where P, L, 
M and V stand for Possibly, Little, More and Very. 

Suppose that W^= 0.6, fm(S) = 0.6, fm(L) = 0.4, fm(V) = 
0.35, fm(M) = 0.25, fm(P) = 0.2, fm(L) = 0.2. 
Step (14)-(21): so less small at two levels of partitioning, we 
only built two levels of partitioning. 

We hsiYcfm(VL) = 0.14, fm(ML) = 0.1, fm(LL) = 0.08, 
fm(PL) = 0.08. 

By LL<PL<L<ML<VL so we have I(VL) = [0.86, 1], 
I(ML) = [0.76, 0.86], I(PL) = [0.68, 0.76], I(LL) = [0.60, 0.68]. 



V. Conclusion 

In this paper, we propose a new method for manipulating 
data with interval values in object-oriented database that its 
information is fuzzy and uncertainty. This approach is 
quantitative semantics based hedge algebras. With this 
approach, the data manipulation is easy because interval 
values are converted into sub interval in [0, 1]. The fuzziness 
of the term in the hedge algebras is also sub interval in [0, 1]. 
So the comparison interval values with a fuzziness measures 
in hedge algebras become the comparison on the two segments 
[0, 1]. We proposed a computational method of the class by 
using a combination of hedge algebras and computing on it. 
Basins on comparising interval values, we proposed two 
algorithms SFTVA and SFTVM for searching data with fuzzy 
conditions for both attributes and methods. 
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Abstract — : The well drilling process became very boring, 
requires a choice of the justified solution from a set possible. 
Because of major bulk received and treated data, originating 
vastness of problem situations. The relevant value thus has 
information supply of drilling process for a possibility of effective 
human-engine acceptance of a solution. The complexity of 
operations at boring inclined, horizontal, sectional, on shelf of 
ocean - all this requires adequate reacting at operating (on-Line) 
control by well-studying process. The realization of computer- 
Aided control systems in many aspects depends on progress the 
applicable computer for conducting conversation in an 
interactive system of automated control. 

Keywords- Decision-Making, drilling process, inclinometric 
data, automated control, Information System, well trajectory, 
azimuth and zenith angles, Plane Projection. 



I. 



Introduction 



The work describes methods and means for processing, 
presentation, interpretation of On-line inclinometric data of 
drilling. But it should be noted that the problems of 
inclinometric data processing are not directly provided with 
methods of recognition [1]. However, introduction of these 
problems follows, on the one hand, from a wish of a more 
complete coverage of drilling problems and importance in 
connection with a growing interest particularly to slant and 
horizontal drilling. On the other hand, evaluation of the results 
of actual drilling is also qualification, an appraisal of a 
situation as a very important part in decision-making. 



II. 



Discussion 



In view of the above and applying basic methods and 
mathematical relationships for estimation of ultimate values of 
azimuth and zenith angles there were proposed methods and 
means for plotting the design and actual paths of wells in 
space, in vertical and horizontal planes, their viewing from 
different sides, change of data for variation in a real time, and, 
consequently, for prediction of a path and On-line decision- 
making. 



A process of getting data on a spatial location of a bore-hole 
includes two stages: obtaining of initial inclinometric 
information with the help of various technical means and 
processing of this information; and the role of processing is 
rather high. The main objective of processing is determination 
of a location of a bore-hole, and by applying an appropriate 
calculation method we can obtain more accurate results with 
the same number of measurement points. Different 
mathematical methods for plotting of a bore-hole path by the 
results of inclinometric measurements are available. However 
the problems of processing are much wider. 
The problems of On-line control are closely connected with 
the problems of design of an optimal profile, and also with the 
problems of On-line management of slant hole drilling. In fact, 
control and management can be considered as two subsystems 
of a single system of control and management of a drilling 
process [2]. 

The methods and means described in this paper enable 
resolution of the following problems of processing of 
inclinometric information and design problems: 

introduction of parameters of a design profile; 

calculation of a design profile of a bore-hole; 

introduction, arrangement and merging of data base 

obtained in multiple measurements; 

accumulation of information on wells; 

control of a current location of a well bottom; 

plotting of horizontal and vertical views of a well; 

plotting of a bore-hole path in spatial coordinates (x, 

y>z)\ 

comparison of an actual bore-hole path with the 
design one and revealing of dangerous deviations 
from a project; 

recommendations on a zenith angle and an azimuth 
for connection by a straight line of the actual bore- 
hole bottom with the design one; 
Preparation of reports. 

For fulfilling of a project assignment for construction of a 
well, i.e., for drilling of a bore-hole along a design path with 
hitting the set point of penetration of a producing formation 
with minimum deviations the technologist should have a 
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possibility of continuous monitoring of a bore-hole path and 
revealing any deviations. Using such possibility a technologist 
can take timely management decisions and on their basis make 
necessary alterations in a controlled object [3] - a drilling 
process. 



The developed program in the Delphi environment makes it 
possible to show the actual and design bore-hole paths both 
projected on a vertical and horizontal plane and 
axonometrically (a spatial representation), to estimate 
parameters necessary for monitoring a bore-hole drilling, to 
collect, store and present information. 



A module for interpretation of inclinometric data "Fig. 1" 
consists of three modules: an initial data input module; a 
module for algorithmic calculations "Fig. 2"; an information 
output module "Fig. 3". 




Figure 1 Graphic interpretation of inclinometric data 























UKBaMIMd Em flaMMbie 


TpaeKTopufl rioMowb 
















lWm|fi| D *| 


















|100S z 






NO fljima | 


leHHTHblfi '-ire' 


A3HMMT 


Wx 


Wy |W* 


Pe3ynbTaTbi pacieroB 

PaccTiMHwe ot safofl ao ueHTpa Kpyra, m 

PaccroflHwe ot ropH30HTa.nt.Hofi npoeKUHH saoofi, t-. 






90 2250 




319 


20,008 


■17,476 


37,454 




^H| 2275 


319 


20,173 


■17,62 




02 2300 


0,3 


319 


20,338 


■17,763 




93 2350 


0,3 


319 


20,666 


-18,05 


12.1 


ao 6fMXHero span Kpyra 
ao uempa nopabi Kpyra n 
ao aa^bMero upas Kpyra 
Ashmltt, rpanyc: 


no a3HM!frij 3a6on .. 

a3WML[TL| 3a(j0fl 

10a3HML[Ta3a6ofl... 




94 2375 


0,45 


319 


20,913 


■18,265 


8.4G5 




95 2400 


1,15 


319 


21,323 


■18,624 




96 2425 


1,3 


319 


21,816 


-19,054 


12,101 




97 2450 


1,15 


319 


22,226 


■19,413 


15,735 




98 2475 


1,3 


319 


22,718 


■19,843 




99 2500 


1,45 


319 


23,284 


■20,345 




100 2525 


1,3 


319 


23,786 


■20,776 








101 2550 


1,45 


319 


24,361 


■21.278 1 


2G9.244 


Ha npaebifi upafi Kpyra 

oeHHTHt.iM yron, rpaayc: 




102 2575 


2,15 


319 


25,1 


■21,923 


380,755 




103 2G00 


2,15 


319 


25,838 


■22,568 




104 2625 


2,15 


319 


26,578 


-23 214 




105 2650 






27,317 


-23,858 


G.1G 




Ha driHJKHHfi Kpafi Kpijra n 
Ha aa^bHHfl Kpafi Kpyra n 

KoOpAUHOTbl KOHeiHOH TC 

npw (SypeHHH 

X 

KDOpAHHdTU UeflH 


o a3HMifri| 3a()0H 


► 


^Bj^^^E^I 


iia^^BSE 






3.8G5 




107 2700 


2,15 319 


28,785 


-25,15 


3.8G5 




108 2725 


2,15 319 


28,534 


-25,795 


ikh npatyuna 
no saflaHHufi Tpae 
24,817 Y p2C 

40.9456 Y p8 


"G95 - 




109 2750 


2,15 319 


30,273 


-26,441 1 




110 2760 


2,15 235 


30,047 


-26,762 1 




111 27G0 


2,15 235 


23,536 


-27,404 




112 2800 


2,15 235 


28,144 


■28,046 * 




113 2820 


2,15 235 


28,683 


-28,688 










\AA 






±1 





















Figure 2 Module for Algorithmic Calculations (initial and estimated 
Parameters of spatial location of a well) 
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Figure 3 an Information Output Module (3D Well Trajectory 
Plane Projection) 



In the next future the work will be continued to develop an 
information system for processing geology-technological data 
[4]. 



III. 



Conclusion 



In this paper the following results were obtained: 

Developed, on the basis of the available mathematical 
software for processing of inclinometric data, is a program for 
showing on a display of axonometric paths (Trajectory) of a 
design and actual well, their turning around the vertical, 
selection of projections to horizontal and vertical planes, 
scaling of selected parts of paths, changes of azimuth and 
zenith angles, prediction of these changes in relation to an 
assumed zone of hitting the assigned area of a path. 
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Abstract - Malware is become an epidemic in computer net- 
work nowadays. Malware attacks are a significant threat to 
networks. A conducted survey shows malware attacks may 
result a huge financial impact. This scenario has become 
worse when users are migrating to a new environment which 
is Internet Protocol Version 6. In this paper, a real Nimda 
worm was released on to further understand the worm beha- 
vior in real network traffic. A controlled environment of both 
IPv4 and IPv6 network were deployed as a testbed for this 
study. The result between these two scenarios will be analyzed 
and discussed further in term of the worm behavior. The ex- 
periment result shows that even IPv4 malware still can infect 
the IPv6 network environment without any modification. New 
detection techniques need to be proposed to remedy this prob- 
lem swiftly. 

Keywords-IPv6, malware, IDS. 

I. Introduction 

IPv6 is a new network protocols which is meant to over- 
come IPv4 problems. Many advantages offered by this new 
protocol including 1) A large number of address flexible 
addressing scheme 2) Offers packet forwarding more effi- 
cient 3) Support for secure communication 4) Better sup- 
port for mobility and many more [1]. Although IPv6 offers 
a lot of benefits, people are still reluctant to totally migrate 
from IPv4 to IPv6 network. This is because even IPv6 have 
been deployed for many years, this protocol is still consi- 
dered in its infancy [2] . Many researchers have spent ample 
of time to enhance the IPv6 services to become at least at 
par with IPv4 addresses. Since IPv4 addresses are facing 
depletion, migrating to IPv6 is inevitable eventually [3-5]. 
Some studies claimed that IPv6 cause many security issues 
[6-9]. Unfortunately, researchers pay little attention on 
IPv6 security issues[10]. Thus, some culprits are really 
eager to fully utilities all the vulnerabilities occur during 
this transition period. Producing malware is one of the most 
popular techniques to be used. Studies show that new age 
mal wares can survive in new network environment [11, 
12]. Hence, researchers agree that further studies have to be 
conducted to remedy the malware infection issues [13-16]. 

Malware is software which rapidly invented to manipu- 
late vulnerabilities of computer networks. Based on [17], 
250 new malware variants were introduced everyday from 
all over the world. These so called new age malwares were 



not new genuine ones but rather innovated from the exist- 
ing malware. These malwares were modified and some 
modules were added to it to avoid being detected from the 
anti-virus software which is using signature patterns to 
detect malwares. 

Malware is become an epidemic in computer network 
nowadays [18]. Malware attacks are a significant threat to 
networks. A conducted survey shows malware attacks may 
result a huge financial impact[19]. This scenario is becom- 
ing worse when users are migrating to a new environment 
which is Internet Protocol Version 6. 

The objectives of this study are to determine whether an 
IPv6 network is totally safe from attacks which were in- 
tended for IPv4 network and to identify malware behavior 
in different network environments. 

In the following chapters, we will explain about some re- 
lated works to this study and followed by the methodology 
used in this experimental research. The experimental design 
will be explained and some result and analysis will be dis- 
cussed. Finally, the conclusion for the overall study will be 
stated in the end of this paper. 



II. Related work 

A. Malware 

Malware are represented by several forms namely vi- 
rus, Trojan, spy ware, ad ware and worms [20, 21]. Each of 
them has different characteristics to attack their victims. 
Their method of propagation also varied including sharing 
memory sticks, downloading files, peer-to-peer applica- 
tions, sharing file and many more. 

B. Malware Propagation Methods 

Many activities can help these malware propagate more 
easily. Unfortunately, most of end-users are not fully aware 
of it due to lack of knowledge about this issue. We have 
classified this propagation in two categories namely 1) hu- 
man intervention and 2) self-propagation. 

Most of malware are spreading involving human inter- 
vention. These activities including transferring virus via 
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memory sticks, installing peer-to-peer applications, down- 
loading files which contain malware and send- 
ing/forwarding malware emails. Mai wares fall in this cate- 
gory are virus, Trojan, spy ware and adware. Since its prop- 
agation based on human intervention, the spreading rate 
cannot be determined cause the key value of spreading the 
virus is very subjective. If those malware transferred rapid- 
ly by victims, then the spreading rate is very high. Howev- 
er, if it just left without any execution in the computer, the 
malware will stay dormant and the spreading rate will be 
low. 
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except for the protocol used to communicate between com- 
puters are different. The testbed design for this study can be 
found in Figure 2. 



Before the worm released, a clean testbed need to be 
ready. Some worms will remain in the memory even after 
the virus was cleaned by the antivirus software. Therefore, 
each computer will be cleaned thoroughly including format 
all computers involve to ensure no other factors will affect 
the result later on. The original configuration for comput- 
ers, router and switch involve will be restored. 



The other propagation category is self -propagation. The 
only malware falls in this category is worm. This is because 
the spreading method has been pre-defined and hardcoded 
in the worm software so that it can launch the attack by 
itself without needed any intervention by human. Worms 
normally will scan for victims before it initiate the first 
attack. Therefore, this worm spreading can be determined 
technically. However, it is not easy to determine it because 
each of them is using different scanning method to search 
for their victims. 



C. Malware Scanning Methods 

The worm scanning methods can be divided into three 
categories as defined by [22] 1) naive random scanning, 2) 
sequential scanning and 3) localized scanning. The first 
scanning method already defined the target regardless the 
information about the victim's network. The example worm 
which is using this technique is Slammer. The second scan- 
ning method will search for vulnerable hosts through their 
closeness in IP address space based on host configuration. 
Blaster worm is an example uses this technique to attack its 
victim. Finally, the last scanning method preferentially 
searches for vulnerable hosts in the local subnetwork. It 
uses the victim's network information to initiate the attack. 
Nimda worm is an example uses this technique to attack its 
victim. 

We believe the localized scanning method is very dan- 
gerous since its will use the information about the current 
network to launch its attack and the result will be disastr- 
ous. What is more, this worm can survive in a new network 
environment for example in IPv6 network environment. 
This paper has used Nimda variant E to be released in both 
IPv4 and IPv6 network environment to see how this worm 
works and how it will affect the network performance. 

III. METHODOLOGY 

In this study, we have planned some work flow in order 
to get our expected result. The methodology used for this 
study as depicted in the Figure 1 . 

In order to test the IPv4 worm behavior in both IPv4 and 
IPv6 network environment two testbeds have been imple- 
mented. The computer setup and configuration are identical 



After the clean testbed ready, the packet sniffer node 
will be activated to capture all packets through the gateway 
router. The reason the gateway router involves in this expe- 
riment is because to simulate as if this environment is ac- 
cessible to the other networks. Therefore, this will stimulate 
the worm to launch its attack to broader scale rather than 
local area network only. 



C Start) 



Prepare the dear testbed 



ed 



Identify the attack pattern 
on IPv6 nodes 



■1: 




Identify the 
network traffic 

behavior 



'•IS 



Identify on how to 

differentiate between 

normal or anomaly 

network traffic. 




Make a further research to 

find malware which will 
affect IPv6 nodes 



"(3 



Figure 1 : Research Methodology 

Since worm in IPv6 is still new, we are expecting two 
different results will occur based on the worm behavior. 
The first one, the worm will survive in IPv6 network envi- 
ronment and attack IPv6 nodes directly. If this is the case, 
then the attack pattern can easily be determined based on 
changes happened in the affected nodes. However, if the 
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worm is not affecting the IPv6 then we will see whether the 
worm probably affect the network bandwidth. Then, if the 
worm is consuming the bandwidth consumption, the ano- 
maly pattern needs to be determined later on. Otherwise, 
the worm can be considered totally dormant in IPv6 net- 
work. 
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S7: Plug out all cables connected to computer to stop the 
simulation and save the network traffic log from PCI for 
further analysis. 

S8: Before starts the next experiment session, all computers 
must be formatted to ensure it is free from worm infection 
in operating system and in its memory. 



IV. Experiment Design 

In this experiment, we used the network layout as depict 
in Figure 2: 



Network Add: 

1 st Sc: 10.1.1.0/24 

2 nd Sc: 2001 :1:1:1::0/64 



Gateway Router 



FaO/0 



Fa0/1 




PC2 PC3 

Figure 2: Testbed Network Layout 

Based on Figure 2, three computers had been setup in 
this testbed namely PCI, PC2 and PC3. PCI was installed a 
packet sniffer software to capture all traffic through the 
gateway router trunk. PC2 and PC3 work as nodes in the 
same network where PC2 as the source who release the 
worm. These computers used Windows XP SP1 as their 
operating system and Nimda variant E will be used as the 
worm in the experiment. 

The procedure of this experiment is as the following: 

SI: Ready all computers, router and switch. Restore all 

default configurations into those computers, router and 

switch. 

S2: Activate the packet capture software on PCI to start 

capture the ideal network pattern. 

S3: Leave the computers for a few minutes to ensure the 

network traffic has become stable. 

S4: Start releases the Nimda. E worm from PC2. 

S5: Wait for a few seconds until we can saw the worm 

started infected the network. 

S6: Leave the computer for a few minutes to ensure the 

worm fully infected the network. 



V. Result & Analysis 

A. The First Scenario 

In this scenario, IPv4 network protocol will be used. 
The network address used for this scenario is 10.1.1.0/24. 
Before the worm was released, the ideal network traffic 
pattern was captured as a benchmark. Figure 3 shows the 
benchmark of an ideal network traffic pattern. 




Figure 3: Ideal Network Traffic Pattern for IPv4 network 



Figure 3 shows the graph about number of packets cap- 
tured through the gateway router in seconds. For an ideal 
network, the traffic through the gateway router interface is 
less than 3 packets per second as depict in Figure 3. These 
packets were released for the network information conver- 
gence. 

After the network stable, the worm was released in the 
network. After the worm was released, the number of pack- 
et received by the gateway router was increased exponen- 
tially as depicted in Figure 4. The sample of the captured 
packet is depicted in Figure 5. 




Figure 4: Network Traffic pattern after Nimda.E worm re- 
leased in IPv4 network 
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After the network stable, the worm was released in the 
network. After the worm was released, the number of pack- 
et received by the gateway router was increased exponen- 
tially as depicted in Figure 7. The sample of the captured 
packet is depicted in Figure 8. 



Figure 4 shows the graph about number of packets cap- 
tured through the gateway router in seconds. After the 
worm was released, it shows that the number of packets 
through the gateway router was dramatically increased up 
to almost 55 packets per seconds as depicted in Figure 4. 
Meanwhile, Figure 5 show the sample of packets captured 
after the worm was released. It seems that the worm re- 
leased TCP flooding those packets were generated by one 
IP address which it is belong to the infected computer 
based on the IP address. We conclude after a computer was 
infected by Nimda.E worm, it will release a massive num- 
ber of TCP connections to connect to its potential victims 
based on the network address information from the infected 
computer. 

B. The Second Scenario 

In this scenario the network layout and the computers 
setup were identical with the previous scenario. The only 
different in this scenario was the computers were using 
IPv6 network protocol instead of IPv4. The network ad- 
dress for this scenario is 2001:l:l:l::0/64. Same as in pre- 
vious scenario, the ideal network traffic pattern was cap- 
tured as a benchmark in it is depicted in Figure 6: 
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Figure 7: Network Traffic pattern after Nimda.E worm re- 
leased in IPv6 network 
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Figure 6: Ideal Network Traffic Pattern for IPv6 network 



Figure 7 shows the graph about number of packets cap- 
tured through the gateway router in seconds. After the 
worm was released, the number of packets through the ga- 
teway router way severely increased to almost 55 packets 
per seconds as shown in Figure 7. Figure 8 shows the sam- 
ple of packets captured after the worm was released. If in 
IPv4, the worm released the TCP flooding but in IPv6 it 
released ARP flooding instead. We believe this is because 
the worm was trying to attack its victim in IPv4 network 
even the worm was released in IPv6 network environment. 
We realized the infected computer is not using 



Figure 6 shows the graph about the number of packet 
through the gateway router in seconds. Same as in previous 
scenario, in an ideal network the traffic through the gate- 
way router is less than 3 packets per seconds which were 
used for the network information convergence. 



C. The Experiment Result Analysis 

After all the experiments done, we gathered all the in- 
formation for further analysis. Figure 9 shows the compari- 
son between numbers of packet released based on different 
scenarios. 
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Figure 9: The average packet released based on different 
scenarios 
Figure 9 shows the comparison of numbers of packets 
released based on three different scenarios. The first line is 
about the average number of packets released in second 
after the worm infected in IPv4 network. The second line is 
about the average number of packets released in second 
after the worm infected in IPv6 network. The last line is 
about the average number of packets released on an ideal 
network. Since the number of packet released in ideal net- 
work are identical between IPv4 and IPv6 network, then 
this information is represented by one scenario only. 

From the Figure 9, we can see that the numbers of pack- 
ets are exponentially increased after the worm was released 
compares to an ideal network regardless the network proto- 
col used whether it is in IPv4 or IPv6 protocol. However, 
the number of packets released in IPv4 is slightly higher 
compares in IPv6 and the type of packets released in each 
network are also different. This is probably because the 
router need more time to process the address information in 
IPv6 due to its long ip addressing scheme. Moreover, the 
type of packet released was also different in IPv4 compares 
to IPv6 where in IPv4 the worm was released TCP connec- 
tions to its victim whereby in IPv6 the worm was released 
ARP packet to connect to its victim as depicted in Figure 5 
and Figure 8. The comparison is compiled in Table 1. 

Table 1: Comparison Between Different Scenarios 





Ideal 
Network 


Infected 
IPv4 Net 


Infected 
IPv6 Net 


Maximum number 
of packets released 
(per sec) 


3 


55 


55 


Average packet 
released per second 


Low 


Slightly 
Higher 


High 


Type of packet 


Network 
Discovery 


ND & 
TCP 


ND & 
ARP 





(ND) 






Type of attack 


None 


TCP 

Flooding 


ARP 

Flooding 



D. The Experiment Findings 

After two different scenarios executed and analyzed, 
we compiled our conclusions for this study as the follow- 
ing: 

• Even IPv6 node infected, it still look for its victim 
in IPv4 network. This shows that IPv4 malware still can 
survive in IPv6 network environment without any modifi- 
cation made on the existing worm. 

• In IPv4 network, the nimda worm will release 
TCP flooding attacks whereas in IPv6 network, the worm 
will behave differently by releasing ARP flooding attacks. 

• IPv4 worm will not directly infect the IPv6 nodes, 
but it will totally consume the IPv6 network. IPv6 seem not 
totally invincible from attack even the attack was intended 
for IPv4 network. This scenario will become worse if the 
network is using transition mechanism to communicate 
between IPv4 and IPv6 network protocol. 

VI. CONCLUSION 

Migrating from IPv4 to IPv6 is inevitable. Many re- 
searchers put a lot of effort to ensure the IPv6 services and 
stability to be much better compares to IPv4. However, not 
many researchers pay enough attention on security issues. 
The malware give severe impact on the network which 
cause a lot of trouble to end users. This paper shows that 
malware which was invented for IPv4 network still can 
penetrate and survive in IPv6 network without any modifi- 
cation made on the existing malware. This issue will be 
worse if the organization is using transition mechanism to 
communicate both their IPv4 and IPv6 nodes. 

For further research, a more realistic testbed need to be 
used to represent the real network environment. A study on 
how this worm behaves in transition mechanism such as 
dual-stack need to be conducted to further understand how 
it works. Finally, a new detection technique needs to be 
proposed to cater this issue. 
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Abstract — Development of computer technology in chemistry 
brings many applications of chemistry, not only the application to 
visualize the structure of molecule but also to molecular 
dynamics simulation. One of them is Gromacs. Gromacs is an 
example of molecular dynamics application developed by 
Groningen University. This application is a non-commercial and 
able to work in the operating system Linux. The main ability of 
Gromacs is to perform molecular dynamics simulation and 
minimization energy. In this paper, the author discusses about 
how to work Gromacs in molecular dynamics simulation. In the 
molecular dynamics simulation, Gromacs does not work alone. 
Gromacs interacts with Pymol and Grace. Pymol is an 
application to visualize molecule structure and Grace is an 
application in Linux to display graphs. Both applications will 
support analysis of molecular dynamics simulation. 

Keywords-molecular dynamics; Gromac; Pymol; Grace 

I. Introduction 

Computer is necessary for life of society, especially in 
chemistry. Now, many non-commercial application of 
chemistry is available in Windows version and also Linux. 
The applications are very useful not only in visualization 
molecule structure but also to molecular dynamics simulation. 

Molecular dynamics is a simulation method with computer 
which allowed representing interaction molecules of atom in 
certain time period. Molecular dynamics technique is based on 
Newton law and classic mechanics law. Gromacs is one of 
application which able to do molecular dynamics simulation 
based on equation of Newton law. Gromacs was first 
introduced by Groningen University as molecular dynamics 
simulation machine. 

This paper is focused at usage of Gromacs application. In 
this paper, we tell about how to install Gromacs, Gromacs 
concepts, file format in Gromacs, Program in Gromacs, and 
analysis result of simulation. 

II. Theories 

A. Protein 

Protein is complex organic compound that has a high 
molecular weight. Protein is also a polymer of amino acid that 
has been linked to one another with a peptide bond. 

Structure of protein divided into three, namely the structure 
of primary, secondary, tertiary and quaternary. Primary 



structure is amino acid sequence of a protein linked to it 
through a peptide bond. 

Secondary structure is a three-dimensional structure of local 
range of amino acids in a protein stabilized by hydrogen bond. 

Tertiary structure is a combination of different secondary 
structures that produce three-dimensional form. Tertiary 
structure is usually a lump. Some of the protein molecule can 
interact physically without covalent bonds to form a stable 
oligomer (e.g. dimer, trimer, or kuartomer) and form a 
Quaternary structure (e.g. rubisco and insulin). 

B. Molecular Dynamics 

Molecular dynamics is a method to investigate exploring 
structure of solid, liquid, and gas. Generally, molecular 
dynamics use equation of Newton law and classical mechanics. 

Molecular dynamics was first introduced by Alder and 
Wainwright in the late 1950s, this method is used to study the 
interaction hard spheres. From these studies, they learn about 
behavior of simple liquids. In 1964, Rahman did the first 
simulations using realistic potential for liquid argon. And in 
1974, Rahman and Stillinger performed the first molecular 
dynamics simulations using a realistic system that is simulation 
of liquid water. The first protein simulations appeared in 1977 
with the simulation of the bovine pancreatic trypsin inhibitor 
(BPTI) [8]. 

The main purposes of the molecular dynamics simulation 
are: 

• Generate trajectory molecules in the limited time 
period. 

• Become the bridge between theory and experiments. 

• Allow the chemist to make simulation that can't bo 
done in the laboratory 

C. The Concepts of Molecular Dynamics 

In molecular dynamics, force between molecules is 
calculated explicitly and the motion of is computed with 
integration method. This method is used to solve equation of 
Newton in the constituents atomic. The starting condition is the 
position and velocities of atoms. Based on Newton's 
perception, from starting position, it is possible to calculate the 
next position and velocities of atoms at a small time interval 
and force in the new position. This can be repeated many times, 
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even up to hundreds of times. Molecular dynamics procedure 
can be described with the flowchart as follows: 
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Figure 1. Flowchart molecular dynamics [13] 



From The figure above can be seen the process of 
molecular dynamics simulation. The arrow indicates a path 
sequence the process will be done. The main process is 
calculating forces, computing motion of atoms, and showing 
statistical analysis the configuration for each atom. 
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condition is classical way used in Gromacs to reduce edge 
effect in system. The atom will be placed in a box, surrounded 
by a copy of the atom. 

In Gromacs there are some model boxes. That is triclinic, 
cubic, and octahedron. The second concept is group. This 
concept is used in Gromacs to show an action. Each group can 
only have a maximum number of 256 atoms, where each atom 
can only have six different groups. 

B. Install Gromacs 

Gromacs applications can run on the operating system 
Linux and windows. To run Gromacs on multiple computer, 
then the required MPI (Message Passing Interface) library for 
parallel communication. Gromacs applications can be 
downloaded in http://www.gromacs.org. 

How to install Gromacs is as follows: 

1. Download FFTW in http://www.fftw.org 

2. Extract file FFTW 
% tar xzf fftw3-3.0.1.tar .gz 
% cd fftw3-3.0.1 

3. Configuration 



III. Gromacs 



A. Gromacs Concepts 
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Figure 2. Periodic boundary condition In Two Dimensions [7] 

Gromacs is an application that was first developed by 
department of chemistry in Groningen University. This 
application is used to perform molecular dynamics simulations 
and energy minimization. The concept used in Gromacs is a 
periodic boundary condition and group. Periodic boundary 



-prefix=/home/anas/fftw3 



%. /configure 
-enable-float 

4. Compile fftw 
% make 

5. Installing fftw 

% make install 

6. After fftw installed then install Gromacs. Extract 
Gromacs. 

% Tar xzf gromacs-3.3.1. tar .gz 

% cd gromacs-3.3.1 

7. Configuration 

% Export CPPFLAGS =- 
I/home/anas/fftw3/include 

% export LDFLAGS=-L/home/anas/fftw3/lib 
% Export LDFLAGS =- 
L/home/anas/fftw3/lib 

%. /configure -prefix=/home/anas/gromacs 

%. / Configure-prefix = / home / Anas / 
gromacs 

8. Compile and install gromacs 
% make & make install 

C. Flowchart of Gromacs 

Gromacs need several steps to set up a file input in the 
simulation. The steps can be seen in flowchart below. 
Flowchart illustrates how to do molecular dynamics simulation 
of a protein. The steps are divided into: 
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1. Conversion of the pdb file 

At this step pdb is converted to gromos file (gro) with 
pdb2gmx. Pdbgmx also created topology file (.top) 

2. Generate box 



At this step, the editconf will determine the type of box 
and the box size that will be used in the simulation, on 
Gromacs there are three types of box, namely triclinic, 
cubic, and octahedron. 

3. Solvate protein 

The next step is solvate the protein in box. The 
program genbox will do it. Genbox will generate a box 
defined by editconf based on the type. Genbox also 
determined the type of water model that will be used 
and add number of water molecule for solvate protein 
the water model commonly used is SPC (Simple Point 
Charge). 
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between atoms can be removed by energy 
minimization. Gromacs use mdp file for setup 
parameters. Mdp file specified number of step and cut- 
off distance. Use grompp to generate input file and 
mdrun to run energi minimization. The energy 
minimization may take some time, depending on the 
CPU [21]. 
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Figure 3. Flowchart Gromacs [16]. 

4. Energy minimization 

The process of adding hydrogen bond or termination 
may cause atoms in protein too close, so that the 
collision occurred between the atoms. The collision 



5. Molecular dynamics simulation 

The process of molecular dynamics simulation is the 
same as energy minimization. Grompp prepare the 
input file to run mdrun. Molecular dynamics 
simulations also need mdp file for setup parameters. 
Most option of mdrun on molecular dynamics is used 
in energy minimization except -x to generate trajectory 
file. 

6. Analysis 

After the simulation has finished, the last step is to 
analyze the simulation result with the following 
program: 

• Ngmx to perform trajectory 

• G_energy to monitor energy 

• G_rms to calculated RMSD (root mean 
square deviation) 

D. File Format 

In Gromacs, there are several types of file format: 

• Trr: a file format that contains data trajectory for 
simulation. It stores information about the coordinates, 
velocities, force, and energy. 

• Edr: a file format that stores information about energies 
during the simulation and energy minimization. 

• Pdb: a form of file format used by Brookhaven protein 
data bank. This file contains information about position 
of atoms in structure of molecules and coordinates 
based on ATOM and HETATM records. 

• Xvg: a form of file format that can be run by Grace. 
This file is used to perform data in graphs. 

• Xtc: portable format for trajectory. This file shows the 
trajectory data in Cartesian coordinates. 

• Gro: a file format that provides information about the 
molecular structure in format gromos87. The 
information displayed in columns, from left to right. 

• Tpr: a binary file that is used as input file in the 
simulation. This file can not be read through the 
normal editor. 

• Mdp: a file format that allows the user to setup the 
parameters in simulation or energy minimization. 

E. Gromacs Programs 
1) Pdb2gmx 
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Pdb2gmx is a program that is used to convert pdb file. 
Pdb2gmx can do some things such as reading file pdb, 
adding hydrogen to molecule structure, and generate 
coordinate file a topology file. 

2) Editconf 

Editconf is used to define box water that will be used 
for simulation. This program not only defines the 
model, but also set the relative distance between edge 
of box and molecules. There are 3 types of box such as 

•Triclinic, a box-shaped triclinic 

•Cubic, a square-shaped box with all four side equal 

•Octahedron, a combination of octahedron and 
dodecahedron. 

3) Grompp 

Grompp is a pre-processor program. Grompp have some 
ability that is: 

• Reading a molecular topology file 

• Check the validity of file. 

• Expands topology from the molecular information 
into the atomic information. 

• Recognize and read topology file (*. top), the 
parameter file (*. tpr) and the coordinates file (*. 
gro). 

• Generate *. tpr file as input in the molecular 
dynamics and energy of contraction that will be 
done by mdrun. 

Grompp copies any information that required on 
topology file. 

4) Genbox 

Genbox can do 3 things: 

• Generate solvent box 

• Solvate protein 

• Adding extra molecules on random position 

Genbox removes atom if distance between solvent and 
solute is less then sum of Van der Walls radii of each 
atom. 

5) Mdrun 

Mdrun is main program for computing chemistry. Not 
only performs molecular dynamics simulation, but it can 
also perform Brownian dynamics, Langevin dynamics, 
and energy minimization. Mdrun can read tpr as input 
file and generate three type of file such as trajectory file, 
structure file, and energy file. 

IV. Result of Simulation 

The testing is carried out on different types of protein. Each 
protein has different structure and number of atom. Testing is 
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based on flowchart of Gromacs. This testing do two process, 
the first is energy minimization and the second is molecular 
dynamics simulation. Number of step for energy minimization 
is 200 numstep and molecular dynamics is 500 numstep. 
(numstep = lps) 



From the testing that was made on 4 different types of 
protein it can be seen the difference form of molecule before 
and after simulation. In molecular dynamics simulation, it is 
occurs change-mechanisms of protein structure from folded 
state to unfolded state. Its mechanism is as seen in Figure 4.1. 

In the molecular dynamics simulation above, each protein 
has a different velocity simulation. From the data above we see 
the differences long simulations of each protein. Length of time 
the simulation is depicted with a non-linier graph. Length of 
time simulation is not only influenced by the number of atoms 
but also the number of chain and water blocks. In the case of 
protein Ribonuleoside-Diphosphate Reductase Alpha 2, 
although the number of atom is greater than the protein lggl 
FV-dl.3 Kappa (Light Chain) but the simulation time is more 
quickly. Because the number of blocks and the chain of water 
in this protein are lower than the protein lggl FV-dl.3 Kappa 
(Light Chain). 



*> 
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200ps 



300ps 
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Figure 4. Figure 4.1 Mechanism Unfolded State [16] 
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TABLE I. Simulation Time for 500 picosecond 



Protein 


Number 
of Atom 


Simulation Time for 

500 ps 

(minute: second) 


Alpha-Lactalbulmin 


7960 


34:07 


lggl-kappa dl.3 fv 
(Light Chain) 


2779 


20:07 


Ribonuleoside- 
Diphosphate Reductase 
2 Alpha 


5447 


3:30 


Lysozyme C 


1006 


1:02 
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[13] http://wwwxompsoc.man.ac.uk/~lucky/Democritus/Theory/moldynl.ht 
ml 

[14] http://www.ch.embnet.org/MD tutorial/pages/MD.Partl.html 

[15] http://www.gizi.net 

[16] http://www.gromacs.org 

[17] http://ilmu-kimia.netii.net 

[18] http ://ilmukomputer . org/ 



V. Conclusion 

This paper introduces Gromacs as one of the applications 
that are able to perform molecular dynamics simulation, 
especially for protein. At this writing, the testing is carried out 
on four different types of protein. From The results of testing, it 
can be seen that each protein has a different long time. 

At the protein Alpha-Lactalbulmin with number of atom 
7960, long simulation time is 34 minutes 7 seconds, lggl FV- 
dl.3 Kappa (light chain) with number of atom 2779, long 
simulation time is 20 minutes 7 seconds. Ribonuleoside- 
Diphosphate Reductase Alpha 2 with number of atom 5447, 
long simulation time is 3 minutes 30 seconds. And Lysozyme 
C with the number of atom 1006, long simulation time is 1 
minute 2 seconds. In addition Gromacs also help understand 
the mechanisms Folding and unfolding of protein. 
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Abstract- The main purpose of information security is to protect information and specifically, the integrity, 
confidentiality, and availability of data through an organization's network and telecommunication channels. 
Although information security is critical for organizations to survive, a number of studies continue to report 
incidents of critical information loss. To this end, there is still an increasing interest to study information security 
from a non-technical perspective. In doing so, this research focuses on the linkage between information security 
and end-user trust as a way to better understand and more efficiently manipulate the information security 
management process. That is, manipulating more effectively information security among end-users. Achieving the 
required level of information security within organizations usually requires security awareness and control but 
also a better understanding of end-user behavior in which security measures are tailored, too. In effect, 
organizations may have a clearer insight into how to behave more effectively to such security measures. 

Keywords- Information Security, End-user Trust, Information Technology 



I. INTRODUCTION 

The reliance by every organization upon 
information technology (IT) has increased 
dramatically, as technology has developed and 
evolved. Over recent decades, organizations have 
come to depend on IT for operations, external 
transactions, and mediated communications (e.g., e- 
mail, fascimile). Similarly, information has developed 
into a strategic asset, while the computerized 
information systems have become ultimate strategic 
tools for both government and organizations [1,2]. 
Due to globalization and competitive economic 
environments, efficient information management is 
critical to business survival and effective decision 
making activities. Although, as connectivity to 
devices has increased, so has the likelihood of 
unauthorized intrusion to systems, theft, defacement, 
and other forms of information resource loss. 



In a similar vein, as the society and its economic 
patterns have evolved from the heavy- industrial era 
to that of information society, in terms of providing 
new products and services to satisfy people's needs, 
organizational strategies have changed too. In effect, 
corporations have altered their organizational and 
managerial structures as well as work patterns in 
order to leverage technology to its greatest advantage. 
Economic and technology phenomena such as 
downsizing, outsourcing, distributed architecture, 
client/server and e-banking, all include the goal of 
making organizations leaner and more efficient. 
However, information systems are deeply exposed to 
security threats as organizations push their 
technological resources to the limit in order to meet 
organizational needs [3,4]. 

A number of major studies recently conducted 
[5,6,7] have indicated that security threats continue to 
rise. While security attacks are either internal or 



21 



http://sites.google.com/site/ijcsis/ 
ISSN 1947-5500 



(IJCSIS) International Journal of Computer Science and Information Security, 
Vol.9, No. 2, February 2011 



external, 66% of computer attacks in Greece come 
from employees within organizations [8]. To this end, 
the success of information security appears to depend, 
in part, upon the effective behavior and understanding 
of the individuals involved in its use. Constructive 
behavior by end users and system administrators can 
improve the effectiveness of information security. 
Human behavior is complex and multi-faceted, and 
this becomes more complicated in organizations 
whereas their culture defies the expectations for 
control and predictability that developers routinely 
assume for technology. In support of this, the [9] 
Guidelines for the Security of Information Systems, 
also state that: "The diversity of system user- 
employees, consultants, customers, competitors or the 
general public- and their various levels of awareness, 
training and interest compound the potential 
difficulties of providing security". 

The present research takes a different perspective 
on this issue by focusing on behavioral information 
security: the values and beliefs held by end-users that 
influence the confidentiality, availability, and 
integrity of data through the organizations' 
information systems. To this end, this research 
examines the extent to which information security 
behaviors relate to end-users trust, that is: opening to 
the efficient communication of security risk messages. 
The main research assumption is that end-users trust 
would relate positively to the enactment of 
information security behaviors such as following new 
security policies and communicating security 
messages that are in effect of the organizations' 
business objectives. Hence, information security 
should support the mission of the organizations, it 
must be cost effective and must be in sync with end- 
users behavior seamlessly; that is, integrate 
technology, processes and people. 



II. BRIEF INFORMATION 
SECURITY BACKGROUND 
Although a number of IS security approaches have 
been developed over the years that reactively 
minimize security threats such as checklists, risk 
analysis and evaluation methods, there is a need to 
establish mechanisms to proactively manage IS 
security. That said, academics' and practitioners' 
interest has turned on social and organizational factors 
that may have an influence on IS security 
development and management. For example, 
Reference [10] have emphasized the importance of 
understanding the assumptions and values of different 
stakeholders to successful IS implementation. Such 
values have also been considered important in 
organizational change [11], in security planning [12] 
and in identifying the values of internet commerce to 
customers [13]. Reference [4] have also used the 
value-focused thinking approach to identify 
fundamental and mean objectives, as opposed to 
goals, that would be a basis for developing IS security 
measures. These value-focused objectives were more 
of the organizational and contextual type. 

A number of studies investigated inter- 
organizational trust in a technical context. Some of 
them have studied the impacts of trust in an e- 
commerce context [14,15,16] and others in virtual 
teams [17,18]. Reference [19] studied trust as a factor 
in social engineering threat success and found that 
people who were trusting were more likely to fall 
victims to social engineering than those who were 
distrusting. Reference [20] used a goal setting 
approach to identify weaknesses in security 
management procedures and found that different 
political agendas influenced the level of goal security 
goal setting negatively. 

Reference [21, p. 1551] also reviewed 1043 
papers of the IS security literature for the period 
1990-2004 and found that almost 1000 of the papers 
were categorized as 'subjective-argumentative' in 
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terms of methodology with field experiments, 
surveys, case studies and action research accounting 
for less than 10% of all the papers. That said, this 
research adopts a survey approach to study the 
linkage between information security and end-user 
trust as no prior research has studied these specific 
contexts and their interrelationship. 

III. INFORMATION SECURITY BEHAVIOR 

Information security behavior is part of the 
corporate culture and defines how employees see the 
organization [22]. Most of the literature on 
organizational culture focuses on the hypothesis that 
strong cultures enhance organizational performance 
[23,24]. This hypothesis is based on the notion that 
having widely shared and commonly held strong 
organizational norms and values leads to higher 
performance through at least three ways. First, a 
strong culture enhances coordination and control 
within the organization. Second, it improves goal 
alignment between the organization and its members. 
Third, a strong corporate culture improves employee 
efforts. 

Similarly, organizational culture is a system of 
learned behavior which is reflected on the level of 
end-user awareness and can have an effect on the 
success or failure of the information security process. 
Reference [25] found that users considered a user- 
involving approach to be much more effective for 
influencing user awareness and behavior in 
information security. Reference [26] studied 
influences that affect a user's security behavior and 
suggested that by strengthening security culture 
organizations may have significant security gains. 
Reference [27] investigated security information 
management as an outsourced service and suggested 
augmenting security procedures as a solution, while 
[28] suggested a model based on the Direct- Control 
Cycle for improving the quality of policies in 
information security governance. Reference [29] 



discussed the importance of gaining improvements 
from software developers during the software 
developing phase in order to avoid security 
implications. Reference [30] advanced a new model 
that explains employees' adherence to IS policies and 
found that threat appraisal, self-efficacy and response 
efficacy have an important effect on intention to 
comply with information security policies. 

Behavior, in terms of information security, is the 
perception of organizational norms and values 
associated with information security and so it exists 
within the organizations, not in the individual. To this 
end, individuals with different backgrounds or at 
different levels in the organization tend to describe 
the organization in similar way [31]. Security culture 
is used to describe how members perceive security 
within the organization. Since security and risk 
minimization are embedded into the organizational 
culture, all employees, managers and end-users must 
be concerned of security issues in their planning, 
managing and operational activities. In order to 
ensure effective and proactive information security, 
all staff must be active participants rather than passive 
observers of information security. In doing so, staff 
must strongly held and widely share the norms and 
values of the organizational culture in terms of 
information security behavior and perception. 

IV. END-USER TRUST 

Organizational researchers began to study the 
concept of trust in inter- organizational relationships 
and between organizations [32]. A variety of trust 
models have been applied to various research streams 
[33,34] to explain inter- organizational trust in 
different contexts. For instance, a number of studies 
investigated inter-organizational trust in a technical 
context. Some of them have studied the impact of 
trust in e-commerce [14,15,16] and others in virtual 
teams [17,18]. 
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However, trust determines the performance of a 
society's institutions and is a propensity of people in a 
society to co-operate to produce socially efficient 
outcomes [35]. Reference [36] defined trust as a habit 
formed over centuries long history of horizontal 
networks of association between people covering both 
commercial and social activities. Reference [37] 
defined trust as a "psychological state comprising the 
intention to accept vulnerability based upon positive 
expectations of the intentions or behavior of another" 
(p. 395). 

Reference [38] defined trust as a four place 
predicate in terms that someone has trust in someone, 
in something, in some respect and under some 
conditions. That means the agent trusting (someone), 
the agent being trusted (respect) and the (conditions) 
under which trust is given. Hence, this research 
supports that in information security there is need to 
trust one another in communicating efficiently 
information security risk messages. Specifically, the 
end-users will provide, and not hide, valuable 
information among other people in order to keep 
awareness, control and a better understanding of 
security issues within organizations. 

According to [33], individuals' beliefs about 
another's ability, benevolence and integrity lead to 
willingness to risk, which in turn leads to risk-taking 
in a relationship, as manifested in a variety of 
behaviors. Therefore, a higher level of trust in a work 
partner, increases the likelihood that one will take the 
risk with a partner e.g., to co-operate, share 
information, communicate. In doing so, risk-taking 
behavior is expected to lead to positive outcomes, 
e.g., individual performance, while in social units 
such as work groups, co-operation and information 
sharing are expected to lead to higher group 
performance [39,40]. 

However, other studies that examined the main 
effect of trust on workplace behaviors and outcomes 
found partial or no support. Some studies reported a 



significant main effect and other did not. More 
specifically, [41] found that trust within groups has a 
positive effect on openness in communication while 
[42] found that trust between negotiators mediated the 
effects of social motives and punitive capability on 
information exchange. Reference [43] proposed that 
trust is necessary, but not sufficient, condition for co- 
operation. This terminology suggests that rust may act 
as a moderator although the model does not 
specifically consider how trust might operate in this 
manner. 

However, since high levels of trust within 
organizations have positive effect on openness to 
communication [33], then high levels of trust among 
end-users would improve the communication of 
security messages in the context of information 
security. In respect, this research examines the linkage 
between information security and end-users trust as a 
holistic approach to information security, that is: 
integrate technology, people and processes. 

V. SURVEY OF PERCEPTIONS 

Three hundred and twenty seven (143 women and 
1 84 men) employees of a large sized bank in Greece 
took part in the survey. The respondents ranged from 
junior staff to senior management and were between 
the ages of 22 and 65. They completed an anonymous 
survey questionnaire that was circulated personally by 
the principal researcher and consisted by 18 items. 
The questions were designed to solicit a response on 
the participant's perception of risk, their trust of the 
likelihood of others behaving to organizational norms 
and values and their trust of others in communicating 
efficiently security messages within the organization. 
Table 1 below shows an example of questions. 

For the trust behaviour based questions, 
respondents evaluated their likelihood of engaging in 
risk behaviours (i.e., '...indicate the likelihood of 
engaging in each activity) on a five point rating scale 
raging from 'Very likely' (1) to Very unlikely' (5). 
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For the security perception questions, respondents 
rated their perception of the risk presented by each 
risky behaviour (i.e., ...indicate how risky you 
perceive each activity to be) on a five point scale 
ranging from 'Very significant' (1) to 'Very 
insignificant' (5). 



certain organizational norms and values with regard to 
certain security activities? What are the individuals' 



15. In your opinion what is the likelihood of people in the organization participating in the following activities: 

Share their passwords with other employees. 
Access files they are not authorized for. 



16. For each of the following activities, please indicate how risky you perceive each activity to be: 
Share your password with another employee. 
Access files you are not authorised for. 



17. Please indicate your perception of others in communicating efficiently in the following security related 
activities: 

Challenge the knowledge of another employee on security related tasks. 
Hide information from a co-employee in order to prove your skills. 



18. For each of these activities, please indicate the likelihood of others to behave to organizational norms and 
values: 

Do not meet expiration dates on given tasks. 
Do not share your knowledge with others due to competitive reasons. 



Table 1 . Example of Questions 

For the trust in communicating efficiently security 
messages based questions, respondents rated their 
perception of the likelihood of other people in the 
organization communicating in activities (i.e., ...your 
opinion what is the likelihood of people in the 
organization participating and communicating in the 
following activities) on a five point rating scale raging 
from 'Very likely' (1) to 'Very unlikely' (5). 

The information in this report is based on the 
initial response of the three hundred and twenty seven 
participants. Using a variation of [44] formula to 
determine sample sizes necessary for given 
combinations of precision, confidence levels and 
variability, this survey should have a confidence level 
of 95% with a precision level of greater that zt4%. 

The main purpose of the survey was to find out 
mainly the following: What is the individual's 
perception of the risk involved with certain activities? 
What are the individuals' levels of trust of the 
likelihood of others in the organization behaving to 



levels of trust in communicating efficiently 
information security risk messages within the 
organization? 

The intended outcome of this research is to 
develop a strategy to improve organizational 
information security and an enhancement of trust 
levels to communicating efficiently security messages 
within the organizations. The questions analyze the 
different components relating to information security: 
1) individual perception of risk, 2) individual 
perception of trust that others will behave according 
to organizational norms and values, 3) individual 
perception of trust in communicating efficiently 
within information security activities. 

Table 2 below, shows the responses in 
percentages of the individual perception of risks for 
certain activities (perceived values), the individual 
perception of trust that others are determined to 
communicate efficiently in security-related activities 
(communication), and the individual perception of 
behaving to organizational norms and values (end- 
user trust). The results give interesting insights and 
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reveal gaps in the individual's perception of 
information security and trust in the context of 
organizational norms and values. Male and female 
respondents don't differ significantly in their 
perceptions of risk in all activities with the exception 
of challenging another's knowledge on security tasks 
where 62% of females perceived very significant risk 
in undertaking this activity. It would appear that 
generally female respondents are less likely to engage 
in risky behaviour. Surprisingly 38% of both male and 
female respondents perceive that it is likely or very 
likely that people within the organization are sharing 
passwords with other people. In addition, 84% of 
male and 78% of female respondents perceive it to be 
a significant risky activity. While 1 1 % of male and 
13%) of female respondents implied that they would 
share a password with other people. Thus, it appears 
that while sharing passwords with others is considered 
risky, organizational norms and values ignore such 
behaviour. 

In the context of others communicating efficiently 
security risk messages, 23% of male and 33% female 
respondents perceive hiding information from a co- 
employee as a risky activity yet 82% of male and 73% 
of female respondents said it was unlikely or very 

unlikely they would participate in the activity. This may 
imply that while individuals don't perceive this as a very risky 

activity, they intent to share information with others 

which means that the organization's norms and values 

enable cooperation and overall communication among 

the employees. 

Of the total respondents 42% said that they would 

reuse the same password many times and in terms of 

information security project communication 53% said 

that they would ask for clarity of goal achievement in 

case they are confused. Finally, 53% said that project 

communication initiates from top-executives and that 

trust in top-management provides better 

understanding and control of security issues. In effect, 

communication is improved. The questionnaires were 



taken anonymously to enhance true value, although 
there is an uncertainty of answers that conform to 
what the security policy state as well as the 
employee's actual behaviour. 
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All figures are shown as 

percentage (%) 

Perception of risks for these 

activities 




Male 

Female 

Significant 



Male Female Male 

Female 

Neutral Insignificant 




Share password with others 


50 47 


34 31 


14 14 


12 10 


7 5 


Challenge new employee in work 












place 


20 24 


38 38 


17 12 


11 13 


6 4 


Allow another to use ID pass/card 


38 47 


33 32 


16 16 


21 19 


' 3 


View or download prohibited 












material 


32 47 


31 33 


20 10 


7 11 


5 4 


Forge someone's signature 


26 34 


45 39 


19 6 


5 9 


3 « 


Access unauthorised files 


37 31 


41 34 


17 17 


19 13 


4 3 


Challenge another's knowledge on 












security tasks 


40 62 


30 22 


12 11 


32 29 


12 5 


Hide information from other 












employees 

Trust of others in 


19 21 


22 19 


12 14 


12 21 


11 12 


communicating efficiently 
security messages 


Very 
Likely 


Likely 


Neutral 


Unlikely 


Very 
Unlikely 


Share password with others 
Challenge new employee in work 


18 21 


22 19 


12 13 


29 30 


21 22 


place 

Allow another to use ID pass/card 

View or download prohibited 


16 14 
6 7 


12 11 
3 10 


13 18 
17 13 


24 21 
33 21 


11 22 
19 21 


material 

Forge someone's signature 
Access unauthorised files 
Challenge another's knowledge on 


3 1 

1 1 

2 3 


3 12 
2 6 
5 4 


11 10 

5 3 
15 13 


32 29 

33 21 
20 19 


51 14 
59 26 
50 61 


security tasks 

Hide information from other 


25 31 


24 21 


12 11 


21 19 


48 72 


employees 

Perception of trust of the 


21 20 


19 24 


11 19 


34 25 


29 26 


likelihood of others behaving to 












organizational norms and values 

Share password with others 

Challenge new employee in work 
place 


Very 
Likely 

6 4 


Likely 

7 9 


Neutral 

11 14 


Unlikely 

21 18 


Very 
Unlikely 

49 50 


Allow another to use ID pass/card 
View or download prohibited 
material 


30 21 

7 3 


32 28 
3 2 


16 11 

17 12 


29 19 

23 18 


46 10 

33 30 


Forge someone's signature 
Access unautorised files 
Challenge another's knowledge on 
security tasks 


3 2 

4 1 

3 2 


9 11 

8 2 
8 4 


1 5 
1 6 
11 5 


37 31 

11 9 

12 9 


7 23 
43 56 
77 56 


Hide information other employees 


35 31 


23 21 


16 10 


19 21 


44 43 




32 29 


31 28 


17 22 


33 41 


49 32 



Table 2. Risk perception, perception of trust and likelihood ratings 
by gender. 
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VI. LIMITATIONS AND FURTHER 
RESEARCH 

There are opportunities to undertake further 
intensive research to identify more critical 
behavioural and psychological factors and their 
relation in the context of information security. 
Although high levels of end-user trust goal setting 
plan seems to positively influence information 
security development and management, we cannot be 
sure as to how an these high levels of end-user trust 
could always lead to information security success. 
Future research on information systems security, 
especially research based on surveys, should therefore 
examine the role of other possible factors at the level 
of security planning in addition to end-user trust. 
Likewise, another issue interesting to investigate 
would be the role and type of feedback in 
communication and end-user trust in the context of 
security design, e.g., whether the type of feedback 
(outcome or process feedback) provided affects the 
communication- end-user trust relationship. 

However, there were some biases during the 
collection of data mainly due to the suspicious 
attitude of the IT employees towards the researchers. 
That is, the IT employees through the survey might be 
careful in answering questions with regard to security 
because the issue of information systems security is 
highly confidential and sensitive. To this end, open- 
ended questions were of useful to some extend. 

Moreover, the research findings may be 
influenced by political games that different banking 
units wish to play. As the participation in a research 
survey can help organizational members to voice their 
concerns and express their views they can use this 
opportunity to put forward those views that they wish 
to present to other members of the organization. 



VII. CONCLUSIONS 

There was a belief that information technology 
and security were difficult issues to be understood by 
non-IT staff. Nowadays, it is believed that people 
make the difference to information technology and 
security and that training on the ethical, legal and 
security aspects of information technology usage 
should be ongoing at all levels within organizations 
(Nolan, 2005). Since people react differently to poorly 
constructed security messages, communication will 
broken down and may confuse task knowledge and 
security risk awareness among the employees. Thus, 
the main implication for information security 
management is to focus on changing attitudes and 
human behaviour which are parts of the 
organizational norms and values in order to enhance 
awareness among the employees about information 
security related tasks. In doing so, efficient 
communication of security risk messages among end- 
users will increase since it is important to realize that 
awareness is one of the first steps to obtain active 
employee's participation in the information security 
process and vice versa. That is, a well established 
security awareness will ensure security project 
communication though active participation of 
employees to security related tasks. 

The more organizations rely on information 
systems to survive in competitive markets, the more 
increasing becomes the need to maintain the 
confidentiality, availability, and integrity of data 
through the organization's network and 
telecommunication channels. However, the 
technology advancement rate for the use and 
management of these information systems is more 
radical than the development of means for ensuring 
the confidentiality, availability, and integrity of data 
through them. That is, as organizations become aware 
of security issues, security threats remain high. 
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Although achieving the required level of 
information security among end-users requires also 
security awareness and control, a better understanding 
of the organization's norms and values in which 
security measures are tailored to, is also important. In 
this way, organizations may have a clearer insight into 
how to communicate more efficiently to such security 
measures. 

This research examined the linkage between 
information security and end-user trust as part of 
behavior to organizational norms and values. The 
main research assumption was that end-user trust in 
terms of others communicating security messages 
efficiently, would overall relate positively to the 
enactment of information security behaviors such as 
following new security policies and new technologies 
that are in effect of the organization's business 
objectives. Information security needs to be 
embedded in organizational norms and values so that 
satisfactory security levels can be achieved through a 
clearer insight into the security measures and 
objectives of the organization. High end-user trust 
levels and well trained end-users can address the 
security planning and management of information 
within an organization. Overall, information security 
should support the mission of the organizations, it 
must be cost effective and fit into the organizations' 
culture seamlessly, that is integrate technology, 
processes and people. 

Future research should focus on the perception 
and development of communication strategies and 
how they could be applied to different organizational 
structures as well as security measures and policies 
according to structure organizational size that 
improve end-user awareness on information security. 
That said, different structured organizations may have 
different business objectives and therefore, security 
needs. 
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Abstract — In this letter, we propose a Cellular Automata using 
Vector Quantization Learning for predicting hot mudflow 
spreading area. The purpoe of this study is to determine 
inundated area in the future. Cellular Automata is an easy 
approach to describe the complex states of hot mudflow disaster 
that have some characteristics such as occurring on the urban 
area, levees and surface thermal changing. Furthermore, the 
Vector Quantization learning determines mass transport in the 
surrounding area in accordance with equilibrium state using 
clustering of landslide. Evaluating of prediction result uses 
ASTER/DEM and SPOT/HRV imaging. Comparison study shows 
that this approach obtains better results to show inundated area 
in this disaster. 

Keywords: Probabilistic cellular automata, vector quantization, 
hot mudflow spreading, prediction, mass transport Introduction 

I. Introduction 

Simulating hot mudflow in the plane and urban area requires 
understanding how the surface changing properties vary with 
time and space. In order to generate complex flow about 
interactions between natural and human made topography, we 
need the model of the main mechanical features of hot mud 
depending on landscape data. Another difficulty is to compute 
the simulation of hot mudflow at acceptable rates. However, 
they are difficult to apply in general conditions. 
Argentini [1] introduced a CA approach to simulate fluid 
dynamic with some obstacles and fluid flow parameters. This 
approach used basic rules in the two-dimensional spaces. 
Vicari [2] introduce CA approach to simulate lava flow. This 
approach used Newtonian fluid dynamic concept. 
Combination of both approach obtained a discrete approach 
for predicting hot mudflow [3]. This approach yielded correct 
location and direction of hazardous area, but the intersection 
area between prediction area and real area of hazardous area is 
around 36.44%. This approach is a deterministic approach 
based on Cellular Automata to estimate the areas potentially 
exposed to hot mudflow inundation, concentrate mudflow 
characteristics, combine fluid flow and lava flow properties, 
and neglect difficulty to describe a model of complex human 
made landscape data and random behavior of state changing. 



The previous approach assumes that hot mudflow has similar 
characteristics to lava flow such as thermal changing, fluid 
mass transport rules and material mixing. 
It is difficult to describe some physical phenomena caused by 
complex human made landscape objects such as levees, 
buildings, and other environmental properties. Avolio et al. [4] 
have proposed an alternative Cellular using minimization 
differences to simulate lava flow. This approach has 
stochastically state changing. The key-point of this approach is 
easy to develop. Recently, D'Ambrossio et al. [5] and Del 
Negro et al. [6] have applied the stochastic approach to 
simulate soil erosion. This approach also uses minimization 
differences based on Cellular Automata for other fluid flow 
phenomena. The idea of the use of the stochastic approach 
makes the alternative approach describe complex landscape 
object problems on the hot mudflow disaster [7]. The problem 
of this idea is how to fix probability value of mass transport on 
each neighbor-cell. 

The aim of this letter is a new approach of cellular automata 
model for predicting hazardous area in the hot mudflow 
disaster. This approach uses some ideas such as minimization 
difference model and vector quantization to make cluster of 
mass transport possibility depend on altitude, height of mud 
and plant [8]. Because of cluster continuity by vector 
quantization, it looks like the statistical behavior of landscape 
object in the urban area. Vector Quantization determines 
cluster of inundated area [9] that makes flow difference in 
neighborhood area easy to define in probability values. A 
similar approach has not yet been undertaken for mudflow and 
lava flow in any other place, which appeared in the landslide 
area. However, a simple cellular automata approach is 
considered there. 

Simulation results use the landscape map using ASTER DEM, 
and initial parameters of hot mudflow. This paper shows some 
simulation result on map view in the varying time and 
percentage of predicting performances. We also show the 
comparison of predicting on inundated area and direction with 
the other previous approach. 
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II. Overview of Fluid Dynamic Cellular Automata 

Most numerical approaches to modeling landscape 
evolution simulate the physical flow such as mass transport of 
fluid particles, erosive effects of water discharge, infiltration 
and absorption by solving complex differential equations. CA 
is an alternative approach to simulate fluid flow using a simple 
approach. The current implementation is primarily based on 
D'Ambrossio et al. [5] because it uses "very simple 
approximations intended to describe complex geographical 
effect" and it able to offer "insight into how thermal and 
viscous fluid parameter affects the evolution of landscapes" 
despite its simplicity. 

The CA algorithm simulates first-order processes 
associated with fluvial erosion by iteratively applying a set of 
simplified rules to individual cells of a digital topographic grid 
[10]. The state represents a number of fluid particles in the 
topographic grid, and the subsequent movement and behavior 
(diffusion, and erosion) of the cell is controlled by the rules and 
a few parameters of the current cell and its surrounding 
neighbors [11]. The same rules are applied to all grid cells, i.e., 
there is no outside-imposed distinction between slope and 
channel; the model forms its own channels [11]. 

Figure 1 illustrates how the algorithm works. For example, 
fluid particles move to lower elevations, simulating fluid flow 
in the landslide grid. There are two varying flows; erosion and 
diffusion. The amount of erosion and diffusion each produces 
is proportional to the local slope, simulating speedier erosion of 
steeper slopes and lesser erosion of hard rock surfaces. 
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parameters such as viscosity and surface thermal changing. 
This approach is powerful to simulate fluid flow and easy to 
develop. 




Figure 1. Schematic diagram showing how CA model works 

Xiaoming Wei [12] introduced the simple CA approach for 
highly viscous fluid. Its movement is mainly a result of gravity, 
viscosity damping and friction. This approach uses four 
variables to indicate the expanding potential of a liquid cell; 
there is solid, liquid, amount of material and energy. Setting a 
certain threshold for this variable enables to control the 
expanding behavior of the liquid. For each liquid cell, if its 
energy is higher than a certain threshold, it has the potential to 
spread along its horizontal neighboring cells [17]. This 
approach uses four nearest neighbors and four second nearest 
neighbors. 

Another CA approach to simulate fluid flow uses the 
minimization difference approach that was introduced by 
Avolio [4] and D'Ambrossio [5]. This approach is one 
alternative approach to solve fluid dynamic without 
sophisticated mathematical formulation. It obtains a 
satisfactory model to simulate the lava flow with various 



III. Proposed Approach 

A. General Characteristic of Hot Mudflow Disaster 

On 29 May 2006, the gas exploration operation had caused 
cauldron of hot mud in 6.3 km depth spray out hot mud to 
surrounding areas on Sidoarjo, East Java, Indonesia 
(7.530553°S; 112.709684° E) [13][14]. This disaster located at 
the urban area near Sidoarjo (Figure 2-top). Hot mud had 
spilled over 5000 m3 per-day. It increased over 170,000 m 3 
per-day as reported by Cyranoski [15] and over 150,000 m 3 as 
reported by Harsaputra [16]. 




Figure 2. The location of hot mudflow disaster 

Hot mudflow had an immense impact on environment, 
economic and human resource in the future if no 
countermeasure is conducted (Figure 2-bottom) [17]. Within 
the first two years, the mud flow disaster destroy some villages, 
farm lands, factories and public facilities such as schools, 
markets, roads, water pipes and gas pipes. Over 17,000 people 
had lost their houses and jobs. If facts, approximately mud 
blows out 150,000 m3 per-day with the assumption that 
contains 70% by water. This implies that water come out by 
687,000 barrel a day. This situation is different from some 
disaster areas where the previously occurred other locations 
because it has overmuch mud [18]. 



33 



http://sites.google.com/site/ijcsis/ 
ISSN 1947-5500 



Although one possible solution is spillway to Porong River, 
it does cost and takes a long time and vast human resource. 
Therefore, strong demands on prediction of mudflow spreading 
volume and mudflow disaster area as well as on how to 
evacuate from the area of which the levee that was constructed 
to prevent mudflow spillover are there for people who are 
living in the disaster areas. If inundated area are predicted 
before the mud comes, the Indonesia government makes 
countermeasures to reducing the impact. 

This simulation uses map on February 2008 (Figure 3a) as 
initial map and map on August 2008 as target map (Figure 3b). 
This map is landscape approximation using ASTER/DEM and 
the height data on the some observation points. The map size is 
approximate 3.705kmx4.036km. The red area is mud inundated 
area. In this simulation, mud blows from the main crater (big 
hole) that has a diameter around 20m [8], and mud moves to 
other locations depend on slope difference and mudflow 
parameters. The key process is mass transport that defines the 
amount of mud moving. 
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approach. The algorithm of Minimizatin Differences is as 
follow: 




(a) (b) 

Figure 3. (a) Initial map on February 2008, (b) target map on August 2008 

B. Model Definition 

This model is 2D CA model. It uses two-dimensional grids 
to describe set of cells. The state of cell S is floating point value 
that shows the amount of mud and soil particles. In this 
research, we define two-type variables of state; the amount of 
mud s t (x,y) and the amount of soil ht(x,y). Mud is moving 
material. It moves from one cell to its neighbors using 
probability of move p mov . The other hand, the small part of mud 
also changes into the soil using probability of deposition p vzs . 
The model state is as shown in Figure 4. 



A 

h t (x,y9 
V 



Figure 4. Mud and soil states. 

C. Model Definition 

In this research, we use probability Cellular Automata 
based on Minimization Differences [5] [7] as the main 




(a) 



(b) 



A is the set of cell not eliminated. Its initial value is set to 
the number of its neighbors. Each cell on position (i,j) 
has two components such as soil and mud. The height of 
them are g^ and Sy. Total height of this cell is: hy = gy + 
Sy. There is dynamic soil Uy, but it is the small portion of 
soil and we adjust on normal distribution of p m . 
The average height is found for the set of A of non- 
eliminated cells: 



m = - 



ieA 



n A +l 



(1) 



(c) 

(d) 
(e) 



Where: 

h c is height of the center cell. 

h t is height of the non-eliminated neighbor cells. 

n A is number of non-eliminated neighbor cells. 

c is current mass-transport weighting from the learning 

process. 

The cells with height larger than average height are 

eliminated from A. 

Go to step (b) until no cell is to be eliminated. 

The flows, which minimize the height differences locally, 

are such that the new height of the non-eliminated cell is 

the value of the average weighting height. 

h i= ^ (2) 



When we used probability adjustment depend on height 
differences in the previous research, we use Vector 
Quantization learning to make cluster space of mass transport 
as a probability adjustment in the neighborhood area. We select 
some points in the previous map and the nearest points in the 
current map as paired point. We use standard competitive 
learning to determine height of points around the surrounding 
area. 



= c 



old 



r( C P air +C° ld ) 



(3) 



Where: 



^new 
^old 



c " is a new inundated point in the surrounding area. 



c~ w is an inundated point in the previous map. 

c pair is an inundated point in the current map. 
r is a learning rate. 

In each point, there are some parameters that influence of 
mass transport on simulation process such as altitude (ground 
height), mud height and landslide [8]. Because of the 
discontinuous distribution of abrupt mass movement hazards 
[19], VQ obtains an alternative method to quickly assess the 
degree of hazard for each unit. It creates groups without 
considering whether or not the units in the same group are 
continuously distributed. Figure 5 shows the processing 
schema of hot mudflow spreading simulation. The learning 
process using vector quantization determines a cluster space 
that describes the probability of mass transport. The probability 
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values add some weighting under flow process in minimization 
differences approach. 
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resolution; minimization differences algorithm (48.15%- 
65.67%) in our previous research, Avolio's approach (45.75%- 
63.34%) and Vicari's approach (43.25%-60.25%). Comparison 
of these methods is shown in figure 8. 



Figure 5. The schematic of hot mudflow spreading simulation 

IV. Simulation Results 

In this simulation, we use the current resolution of 
ASTER/DEM (30mx30m). The mud blow volume is around 
150.000 m 3 per day using Gaussian random number around this 
volume. The mixing particle is 70% water and 30% solid 
material. 

A. Simulation Results 

The simulation result is shown as Figure 6. In this figure, 
we show the total inundated area (Figure 6a) and the new 
inundated area (Figure 6b). The red area is the real inundated 
area, the blue area is the predicted area, and the pink area is 
intersection between real area and predicted area. In Figure 7a, 
the intersection area is above 95% that show this approach 
yield a good result of prediction. It is not fair because the 
prediction accuracy is only for new inundated area. Therefore, 
we compare the predicted area and the real area in new 
inundated area only. Figure 7b shows that the intersection area 
in new inundated area is 71.85%. This result is better that the 
previous result that uses minimization difference approach 
(56.44%) [7]. Figure 7 shows the comparison between this 
approach and other approach. 




Figure 6. The simulation result: (a) total inundated area, (b) new inundated 
area using this approach 

Figure 8 shows combination of CA approach and online 
clustering using vector quantization obtain better performance 
to predict new inundated area (54.13-69.13%) than previous 
methods in 3x3 Von-Newmann neighborhood system in all 




Figure 7. Comparison of (a) Vicari's approach, (b) Avolio's approach, (c) 
CA using Minimum Difference approach, (d) CA using VQ approach 
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Figure 8. Comparison with the other approaches 

B. Resolution Influences 

This simulation runs in some resolution. In normal size, we 
use ASTER/DEM map that has resolution 30m and image size 
300x300 pixels. The minimum resolution is 200 pixels (map 
resolution is 45m). The maximum resolution is 700 pixels (map 
resolution is 12.9m). The prediction performance increases by 
increasing resolution and become stable on higher resolution as 
shown in Fig. 9. This figure shows there are two peak points of 
intersection area; in resolution 30m and in resolution 20m. 
They occur because the resolution of our ASTER/DEM data is 
30m, and we use another data (height data on critical points) 
that have resolution 20m. 
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V. Conclusion Remarks 

Through the simulation study with the proposed model 
based on Cellular Automata, we may conclude the following, 



(1) The using vector quantization learning in CA approach 
obtain much better performance to predict new inundated 
area in hot mudflow disaster. 

(2) The prediction performances depend on resolution. 
Increasing resolution will increase the prediction 
performance and become stable in the higher resolution. 

(3) The dangerous levee location for spillover can be found 
with the proposed method. 

(4) Cell size effect is clarified. By considering the resolution 
of data sources, the resolution of ASTER derived DEM 
(Digital Elevation Model) is 30m, the most appropriate 
number of cells of CA is determined with these 
resolutions. 
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Abstract — Preventing the installation and execution of 
unauthorized software should be a high priority for any 
organization. Allowing users to install and execute unauthorized 
software can expose an organization to a variety of security risks. 
In this paper we present a graylisting solution to control 
application execution on Linux clients using a loadable kernel 
module. Our developed kernel based solution, Locking 
Applications on Linux Clients or LALC is a new Linux 
subsystem which adds a graylisting application lockdown 
capability to Linux kernel. The restriction policy applied by 
LALC to specific client is based on the preconfigured security 
level of the client's group and on the application the client desire 
to execute or to install. LALC is flexible enough to support the 
business needs as well as new applications and new versions of 
existing applications. And it is so secure that no end user can 
circumvent its configuration. 

Keywords-Application Lockdown; Linux Kernel Module; 
Restriction Policy; Whitelisting; Blacklisting; Graylisting. 



I. 



Introduction 



The rising number of computer security incidents since 
1988 [3] [4] suggests that malware is an epidemic. 

Malware is referred to by numerous names. Examples 
include malicious software, malicious code and malcode. Many 
definitions have been offered to describe malware. For 
instance, [7] describe a malware instance as a program whose 
objective is malevolent. Malicious codes defined in [6] as "any 
code added, changed, or removed from a software system in 
order to intentionally cause harm or subvert the intended 
function of the system." 

Nowadays, in many organizations, employees can peruse 
web sites, send and receive email, download software, and 
install applications whenever they want. On one hand, such 
openness helps business flow by empowering workers to use 
information freely; on the other, it can risk the security and 
integrity of both computers and data as it opens a wide window 
for malware and malicious attacks. 

Often the first defensive step is to run an anti-virus and 
anti-malware protection software. These programs perform a 
thorough cleaning of existing virus and malware infections, 
returning the systems to a relatively stable state. However, they 
are typically just behind the hacker curve. Computers are 



vulnerable to newly released viruses or attacks until the 
malware code is identified and the anti-virus agents are updated 
on every machine. 

Using these methods makes a "zero day attack" almost 
impossible to prevent using anti-virus software. And due to this 
failure of anti-malware, organizations take the choice of 
locking down their entire networking environments. 

Locking down a network client can mean a lot of different 
things. In this paper we refer to a client as being locked down if 
it is configured in such a way that prevents unauthorized 
applications from being installed or executed. 

It is obvious that locking down clients will stop users from 
installing or executing an application that contains spyware, a 
Trojan, a virus, or some other form of malware. This will 
result in a tremendous security improvement and business 
continuity. 

Locking down client machines can be done using different 
methods. The problem with many of these methods, however, 
is that they are either impractical, costly or places a heavy 
burden on the network administrators. 

In this paper, we develop a kernel based solution for 
Locking Application on Linux Clients (LALC) applying a 
graylisting approach. LALC uses a central server that controls 
applications running on clients. The server was configured to 
define client's security levels and their associate allowable and 
disallowable applications. Clients are configured to request 
server permission on executing an application. The server 
permits or denies client requests by comparing the hash value 
of the requested application to those pre-stored values. For 
flexibility and ease of use, the solution provides a Server 
Configuration Utility for managing clients groups, their 
security levels and their associate restriction lists. 

This paper is organized as follows. In Section II, we revise 
the basic locking down approaches, and we discuss the design 
of LALC in Section III. In Section IV we show how we 
implement and test LALC and we conclude the paper in 
Section V. 
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II. LOCKING DOWN APPROCHES 



Basically, there are three major approaches for locking 
down client applications; blacklisting, whitelisting and 
graylisting. 

A Blacklisting Approach 

This approach applies the security premise "what is not 
expressly defined to be prohibited must be allowed". So in this 
approach only those applications that have been defined to be 
unwanted, the blacklist, will not be executed, all other 
applications will be allowed to run. Clearly this approach will 
not defend against malicious applications not previously 
identified in the blacklist. 

B. Whitelisting Approach 

This is the reverse approach to blacklisting, it applies the 
security premise "what is not expressly defined to be allowed 
must be prohibited". Application whitelisting is emerging as 
the security technology that gives a true defense-in-depth 
capability, filling in the gaps that anti-virus was never designed 
to cover. Application whitelisting is characterized by the 
ability to identify authorized executables and associated files 
and to treat as an attack any program or file that is not on the 
authorized whitelist. Recent advances in application 
whitelisting, including automatically approving files from 
trusted sources to reduce administrative overhead or allowing 
end-users to personalize their endpoint for greater user 
acceptance, has made application whitelisting an attractive 
choice. 

Application whitelisting is a technique gathering 
momentum in commercial security systems. Most implement 
additional access controls within the operating system to stop 
unauthorized programs from running. Products from companies 
such as CoreTrace [5], SolidCore [10] and Bit9 [2] all use 
application whitelists to create a safer working environment. 

C. Graylisting Approach 

This approach combines the previous two approaches; it 
uses three lists, while, black and a gray. This approach works 
by focusing on valid whitelisting applications and allow only 
those applications to run. All the applications in the blacklist 
are not allowed to run. When an application is not in the white 
list or in the black list, it will be placed in the gray list for 
further justification. This approach uses software authentication 
to reduce the problem of malware and other unwanted software 
[9]. 

III. LOCKING APPLICATIONS ON LINUX CLIENTS 
(LALC) 

LALC is a graylisting solution that restricts application 
execution on network Linux clients. The solution maintains 
three lists, a white list for applications that are authorized to 
run, a black list for applications that are solely prohibited and a 
gray list for applications that are neither white nor black. 

LALC deploys client group restriction policy which allow 
establishment of different client groups that have different 
security levels. For system flexibility LALC implements three 
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security levels, namely, Lockdown, Block-and-Ask and 
Monitor. In Lockdown level, only whitelisted applications are 
allowed to run. In Block-and-Ask a confirmation message for 
executing the application is sent to the user when the 
application is gray. In the Monitor level the gray applications 
are allowed to be executed without user confirmation. In all 
security levels, the gray applications are added to the gray list 
for later administrator analyses. 



A. LALC Components 

LALC is a client/server application. On the client side, we 
build two components, a Loadable Kernel Module (LKM) to 
intercept client attempts to execute applications, and an Agent 
program which was designed to calculate the hash value of the 
desired application file using MD5 algorithm and to 
communicate with the server. Although the Agent Module 
employs MD5 algorithm but any other hashing algorithm can 
be used instead. 

On the server side we build a Server program to receive 
client's requests and to generate responses, and a Server 
Configuration Utility to allow administrators to manage client 
groups, security levels and application lists. 

1) Client Components: Two components are deployed on 
each client; the Loadable Kernel Module (LKM) and the 
Agent. 

a) The Loadable Kernel Module (LKM): The LKM is 
built based on the facts that; a loadable kernel module is a 
piece of code that can be dynamically loaded or unloaded from 
the Linux kernel, and once it loaded it becomes a part of the 
kernel [8]. And Linux kernel dedicates a specific system call, 
namely execve, to handle client request to the kernel for 
executing a program file [1]. 

LKM was designed to intercept client requests on behalf of 
the original execve, and to invoke the Agent. Based on the 
return value LKM may or may not allow original execve to 
handle the client application. 

LKM comprises four functions; initialization(), 
custom_execev(), write() and read(). 

• InitializationO :When LKM is loaded into the kernel it 
executes the initialization(). This function redirects 
client calls from the original execve system call to the 
custom_execve function inside the LKM. 
Initialization() performs redirection by replacing the 
execve address in the kernel table by the address of the 
custom_execve(), and saving the original execve 
address. Also the initialization() prepares a 
communication channel to the Agent process via a 
/proc file. It creates a /proc file and connect its 
read/write operations with read() and write() inside the 
LKM. Also it creates two buffers to be used by LKM 
other functions, namely, Request Buffer and Response 
Buffer. Generally, /proc file system is a method used 
for communication between the kernel and user 
processes [9]. Fig. 1 shows how LKM initialization 
function works. 
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custom_execve(): The purpose of this function is to 
replace the original execve system call, and therefore it 
will be executed whenever a client process desires to 
execute an application file. It saves the name of the 
application file to be executed in the Request Buffer 
and sets a flag to indicate that a request to execute an 
application file is pending (Request_Pending = 1). 
After that it wakes up the Agent to handle the pending 
request, and it renders itself in awaiting state. After 
custom_execve wakes up by the write(), it reads the 
Request Buffer and resets the pending flag. Based on 
the value in the buffer, custom_execve either allows 
the execution of the application or denies it. On 
allowing execution custom_execve executes the 
original execve system call, and on denying, it returns 
an error code on behalf of the original execve system 
call. Fig.2 shows how the custom execve function 
works. 
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Q start ~^ 
Save the name of the application to be executed in the request buffer 



H 



Set Request_Pending = 1 

I 



Wake up the Agent process 



Read the Response buffer 



Set Request_prnding =0 




Set Ret_value = error code 



Call the original execve system call 
Set Ret_Value= return value from original execve 



Prepare the Request and Response buffers 



Create a /proc entry 



Save the address of the 
execve system call 



Put the address of the custom_execve function in the 
system call table in the place of the original system call 



Figure 1. KLM Initialization Function 



Return Ret_value 



Figure 2. LKM custom_execve function 

b) The Agent: The Agent program is a user level 
program that runs in the client machine. Its purpose is to 
calculate the hash value for the application file content, and to 
forward it to the server combined with the requesting client 
hostname and the application file name. Later, the Agent has 
to forward back the server's response to the LKM 
custom_execve function through writing to /proc file. Fig. 3 
shows how Agent works. 



read(): When the Agent tries to read the /proc file this 
function is executed. It waits until the variable 
Request_Pending is set. Once the variable is set, it 
returns the contents of the Request Buffer - which is 
the application file name- to the Agent module. 

write(): When the Agent tries to write to the /proc file 
this function is executed. The purpose of write() is to 
write to Response Buffer the message that the Agent 
desire to write to the /proc file and then it call upon 
custom_execve function. 



start 



Z 



Send the request to the server 
Receive the response from fie server 



z 



Write the response to the LKM by 
g/procfile 



Figure 3. Agent program main loop 

2) Server Components: Two components are deployed on 
the server side; the Server program and the Server 
Configuration Utility. 
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a) Server Program: The main task of the Server 
program is to receive client requests via Agent programs and 
to respond to those requests. The request's hash value and the 
requested client host name are used by the server to generate 
the permission response, and it uses the application file name 
to identify the client in its log file. 

The server generates the response by manipulating a 
database which stores information about client groups, group's 
security levels and application lists. The server waits for 
Agents connections on a specific TCP port, and when an Agent 
connects to that port, the server receives the request and sends 
back a response. Fig.4 shows how the server works. 

b) Server Configuration Utility: The Server 
Configuration Utility is a friendly graphical user interface for 
enterprise administrators to configure the Server to enforce 
enterprise restriction policy. They can use it to manage clients, 
clients groups, group's security levels and application lists. 

( start ~") 
Initialize the TCP port 
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ubuntu 7.04 have been chosen as an operating system for client 
and server machines. The LKM is written in C language. The 
Agent, Server and the Server Configuration Utility are written 
in C++ with Qt4 library. Qt is a library that helps in building 
GUI C++ programs. The database management system used 
was SQLite. SQLite is a self-contained, serverless SQL 
database engine. The hashlib++ library was used to generate 
the hash of executable files in the agent program. 



Wait for agent connections 



B. Testing 

To test LALC, LKM and the Agent program have been 
compiled in the client side. A shell script has been written to 
load the LKM and to run the Agent at startup. When the client 
machine comes up the LKM and the Agent are ready. 

The Server and the Server Configuration Utility have been 
compiled in the server machine and the Server was started. 
Groups have been added using the Server Configuration Utility 
and clients have been added to each group. The lock-down 
security level has been chosen for the group and applications 
have been added to the whitelist. 

We test the system by attempting to launch two programs 
form the client machine, one is a white listed and the other is 
not. The system performs exactly as expected; the whitelisted 
program is executed while the other one is prohibited. 



Read the group, security level and 
software list from the database 



Determine the response 



V. CONCLUSIONS 

LALC brings an easy-to-use, kernel integrated solution for 
locking applications on Linux clients. Its simplicity makes 
extending it fairly easy, while its integration into Linux kernel 
allows it to improve Linux security features that support 
enterprise needs. 



Send the response to the agent 



Figure 4. Server program loop 

The database manipulated by the configuration utility 
consists of three tables that stores information about clients, 
client groups, and restriction rules. 

The clients table contains information about each client, 
which includes; the client host name and its corresponding 
group ID. The client groups table is where group information is 
stored, which includes; group ID, group-name and the group 
security level. The restriction rules table stores information 
about rules applied to each group. A rule specifies the applied 
list (white or black) to a specific application for a particular 
group. 
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IV. IMPLEMENTATION AND TESTING 

A. Implementation 

Many tools have been used to implement the system. Open 
source tools have been chosen for implementation. Linux 
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Abstract—This paper presents the benefits of using 
coiflet wavelet for feature extraction from the surface 
roughness image. The features extracted are learnt by 
the Locally weighted projection regression network 
(LWPR) method. The image captured through Charge 
coupled device (CCD) camera undergoes preprocessing 
to remove noise and enhance the quality of image to 
make the details of the pixels more clear. The image is 
decomposed by using coiflet wavelet. Four level of 
decomposition is done to obtain detailed information, 
Entropy measure is applied and subsequently Locally 
weighted projection regression network method 
(LWPR) is used for training the entropy calculated. The 
target values labeled are with surface roughness within 
the limits or not. The values are trained using LWPR 
and a set of final weights are obtained. Using this final 
weight values, different portion of the image is analyzed 
to verify, if the roughness is within the limit or not 



Keywords- Locally weighted projection 
regression network method (LWPR), discrete wavelet 
(DWT) 



1. INTRODUCTION 



Measuring a rough surface is based on grey 
levels corresponding to the surface texture. Deeper a 
valley, the darker the corresponding pixel, the higher 
a peak, the brighter the corresponding area in the 
image. Modern instruments can give a three- 
dimensional (3D) measure of a surface. There is no 



single technique that can be used to entirely 
characterize a texture. Image is analyzed at one 
single-scale; a limitation that can be removed by 
employing a multiscale representation of the textures 
similar to wavelet transform. Wavelets have already 
been applied successfully as a tool for characterizing 
engineered surfaces with one-dimensional (ID) 
profiles but also in 2D for characterizing some 
particular engineering applications. Industrial 
inspection is a very popular field for using wavelets. 
They are well suited to detect the defects like 
scratches on a uniform texture. It should be 
mentioned that for special monitoring tasks, images 
to be processed often come from a CCD camera. 

Surface finish is an apparent witness of tool 
marks or - lack of same - on the machined surface of 
a work piece. Surface finish is a characteristic of any 
machined surface [1-5]. It is sometimes called 
surface texture or roughness. The design engineer is 
usually the person who decides what the surface 
finish of a work piece should be. They base their 
reasoning on what the work piece is supposed to do. 
Here are a few examples that the engineer considers 
when applying a surface finish specification: 

• Good surface finishes increase the wear 
resistance of two work pieces in an assembly 

• Good surface finishes reduce the friction 
between two work pieces in an assembly 
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Surface finishes are usually specified with a "check 
mark" on the blueprint as shown in the Figure 1. 
Surface finishes are specified in micro inches and are 
located on the left side of the symbol above the check 
mark "V" shown Figure 1 . The waviness requirement 
(if specified) is usually given in thousands of an inch 
and is located on the top right of the symbol. In the 
example it is the value ".0015". The roughness width 
requirement (if specified) is usually given in 
thousands of an inch and is located on the bottom 
right of the symbol. In the example it is the value 
".002". The lay direction requirement (if specified) is 
usually represented by a symbol [6-10] and is located 
right below the roughness width requirement. In the 
example it is the symbol for perpendicularity. The 
graphic below show the rest of the symbols [11]. 



.0015 (waviness) 



(mjcnoiwhes) 



.002 ; outness) 

perpiricftculany symbol 




Fig. 1 Surface finish representation 

2. WAVELETS (WT) 

The WT was developed as an alternative to 
the short time Fourier transform (STFT). A wavelet is 
a waveform with limited duration that has an average 
value of zero. Comparing wavelets with sine waves, 
sinusoids do not have limited duration, they extend 
from minus to plus infinity and where sinusoids are 
smooth and predictable [12]. Wavelet analysis is the 
breaking up of a signal into shifted and scaled 
versions of the original (or mother) wavelet. 
Mathematically, the process of Fourier analysis is 
represented by the Fourier transform: 






(1) 



which is the sum over all time of the signal f(t) 
multiplied by a complex exponential. The results of 
the transform are the Fourier coefficients, which 
when multiplied by a sinusoid of frequency, yield the 
constituent sinusoidal components of the original 
signal. Graphically, the process looks like: 




Fourier 



Jw!m 





Signal 



MM srmitfs of dStal faqu™ 



Fig.2 Wavelet 



The continuous wavelet transform (CWT) (Figure 3) 
is defined as the sum over all time of the signal 
multiplied by scaled, shifted versions of the wavelet 
function: 

C{ scale, posit ion ) = \ f{ t )i|F ( scale, position, £} dt 

(2) 

The result of the CWT is many wavelet coefficients 
C, which are a function of scale and position. 
Multiplying each coefficient by the appropriately 
scaled and shifted wavelet yields the constituent 
wavelets of the original signal: 




Wavelet 



Transform 



-4- 



^^ 



Signal 



Scaling 



Constat wffliwtets of disrefll safes arrfposita 



Fig. 3 Continuous wavelet 



Scaling a wavelet simply means stretching (or 
compressing) it. The scale factor works exactly the 
same with wavelets. The smaller the scale factor, the 
more "compressed" the wavelet. 



Shifting 

Shifting a wavelet 
hastening) its onset, 
function by k 

Coif let wavelet 



simply means delaying (or 
Mathematically, delaying a 



Inspite of existing different wavelets, coiflet wavelet 
whose function has 2N moments equal to and the 
scaling function has 2N-1 moments equal to has 
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been considered. The two functions have a support of 
length 5N-1. 

The features are obtained from the Approximation 
and Details of the 4 th level by using the following 
equations 

Vl=l/d X (Approximation details) (3) 

Where d = Samples in a frame and 

VI = Mean value of approximation 

V2= 1 /d X (Approximation or details -V 1 )) (4) 

Where V2=Standard Deviation of approximation 

V3=maximum (Approximation or details) (5) 
V4=minimum (Approximation or details) (6) 
V5=norm (Approximation or Details) 2 (7) 

Where V5 = Energy value of frequency 



3 . .LOCALLY WEIGHTED PROJECTION 
REGRESSION (LWPR) 



LWPR achieves better results in nonlinear function 
approximation in high dimensional spaces. It is 
insensitive to redundant data. It uses linear models 
locally [13, 14]. Univariate regressions in selected 
directions are used in the input space. The 
nonparametric local learning system learns rapidly. It 
uses second order learning methods based on 
incremental training. Weight adjustments are done 
based on local information only. Training LWPR is 
done as follows, 

The 5 features obtained are used as inputs for the 
LWPR and the target values for training each surface 
roughness type is based on labeling. 

1 . Input extracted features from wavelet. 

2. Initialize LWPR using diagonal distance 
matrix a, norm, meta rate and initial_X. Many 
other variables can be initialized or made 
constants depending upon the requirements. 

3 . Create random numbers . 

4. Choose input and target output of a pattern 

5. Find global mean and variance of the patterns. 

6. Normalize input and output. 

7. Compute the weight. 



8. Check if new random field has to be added. 

9. Find mean square errors between target and 
the estimated values. 

10. Repeat steps 5 to 9 until all the patterns are 
presented. 



4 SCHEMATIC DIAGRAM 



Image of 
machined surface 


— ► 


Preprocess the 
image suitably 



Get wavelet co- 
efficient 


* — 


Decompose using 
coiflet wavelet 



Training using T\ eurowavelet 



Testing for the actual surface roughness 
Fig.4 Training and testing 



Label the features 


— ► 


Train using LWPR 
and store final 
weight 



Image of 
machined surface 


— * 


Preprocess the 
image suitably 



Get wavelet co- 
efficient 


4 — 


Decompose using 
coiflet wavelet 



Process 

coefficients with 
final weights 


— * 


Check for surface 

roughness range 
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5 IMPLEMENTATION 
Training 

1 . Read each Image 

2. Remove noise 

3. Enhance image 

4. Decompose by discrete wavelet (DWT) of type 
coiflet 

5. Decompose by 4 levels 

6. Find feature from the approximation matrix at the 
4 th level decomposition 

7. Label the features based on the type of surface 
roughness measured for the machined work piece 
using profilometer 

8. Repeat step 1 to step 7 for different types of 
acceptable and unacceptable roughness values 

9. Train the LWPR using input and corresponding 
labels obtained in previous steps. 

11. Store the Final Weights in a File. 



M3,F150,S1000,.5DOC,49DIA CUTTER 
M4,F150,S1000,.8DOC,49DIA CUTTER 
M5,F200,S800,..5DOC,49DIA CUTTER 
M6,F200,S800,.8DOC,49DIA CUTTER 
M7,F200,S1000,.5DOC,49DIA CUTTER 
M8,F200,S1000,.8DOC,49DIA CUTTER 



7. RESULTS 



Sample images 




(a) 



(h) 



Testing 

1 . Read each Image 

2. Remove noise 

3. Enhance image 

4. Decompose by discrete wavelet (DWT) of type 
coiflet 

5. Decompose by 4 levels 

6. Find feature from the approximation matrix at the 
4 th level decomposition 

7 process with final weights of LWPR 

8. Classify the roughness. 

6 . EXPERIMENT DETAILS 

Milling machine has been used to machine flat 
specimen under the following condition 

M1,F150,S800,.5DOC,49DIA CUTTER 

M2,F150,S800,1DOC,49DIA CUTTER 




(c) 



(d) 




(e) « 

Fig. 5 Images used for training and testing LWPR 
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Fig. 6 Surface roughness under magnification 




Histogram of sample image 






Fig. 7 Histogram of an image with surface roughness 




Fig 8 Surface roughness pattern 



Feature patterns are developed from the surface 
roughness images obtained after machining. The 
patters are separated as training and testing patterns. 
The patterns are labeled with range of surface 
roughness values. 



8. CONCLUSION 

This work has been focused in estimating 
the surface roughness values from the image of 
machined surface in milling. Coiflet wavelet is used 
for image decomposition and radial basis function 
network for learning the training patterns to obtain 
final weights for finding roughness from new images. 
The performance of this work is only 95%. The 
performance has to be improved by changing the 
topology of the LWPR 



9. 
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Abstract — This paper proposes a new technique to extract 
the palmprint features based on some fractal codes. The 
palmprint features representation is formed based on position 
of range blocks and direction between the position of range 
and domain blocks of fractal codes. Each palmprint 
representation is divided into a set n blocks and the mean 
value of each block are used to form the feature vector. The 
normalized correlation metrics are used to measure the 
degree of similarity of two feature vectors of palmprint 
images. We collected 1050 palmprint images, 5 samples from 
each of 210 persons. Experiment results show that our 
proposed method can achieve an acceptable accuracy rate 
with FRR = 1.754, and FAR= 0.699. 

Keyword; biometrics, fractal codes, fractal dimension, 
feature extraction, palmprint recognition 



I. INTRODUCTION 

The personal verification becomes an important and 
highly demanded technique for security access systems in 
this information area. Traditional automatic personal 
recognition can be divided into two categories: token- 
based, such as a physical key, an ID card, and a passport, 
and knowledge-based, such as a password and a PIN. 
However these approaches have some limitations. In the 
token-based approach, the "token" can be easily stolen or 
lost. In the knowledge-based approach, the "knowledge" 
can be guessed or forgotten [21]. In order to reduce the 
security problem caused by traditional methods, biometric 
verification techniques have been intensively studied and 
developed to improve reliability of personal verification. 
Biometric-based approach use human physiological or 
behavioral features to identify a person. The most widely 
used biometric features are of the fingerprints and the most 
reliable are of the irises. However, it is very difficult to 
extract small minutiae features from unclear fingerprints 
and the iris input devices are very expensive [19]. Other 
biometric features such as of face, voice, hand geometries, 
and handwritten are less accurate. Faces and voices can be 
mimicked easily, hand geometries and handwritten can be 
faked easily. 

Palmprint is the relatively new in physiological 
biometrics [18]. There are many unique features in a 
palmprint image that can be used for personal recognition. 
Principal lines, wrinkles, ridges, minutiae points, singular 
points and texture are regarded as useful features for 
palmprint representations [21]. A palmprint has several 
advantages compared to other available features: low- 



resolution images can be used, low cost capture devices 
can be used, it is very difficult or impossible to fake 
palmprints, and their characteristics are stable and unique 
[18]. 

Recently, many verification/identification technologies 
using palmprint biometrics have been developed 
[2],[3],[4],[5],[11],[12],[13],[18],[21]. Zhang et al. [21] 
applied 2-D Gabor filter to obtain the texture features of 
palmprints. Pang at al. [13] used the pseudo-orthogonal 
moments to extract the features of palmprint. LI et al. [12] 
transformed the palmprint from spatial to frequency 
domain using Fourier transform and then computed ring 
and sector energy features. Connie at al.[2] extracted the 
texture feature of palmprint using PCA and ICA. Wu et 
al. [18] extracted line feature vectors (LFV) using the 
magnitudes and orientations of the gradient of the points 
on palm-lines. Kumar et al. [11] combined the palmprints 
and hand geometries for verification system. Each 
palmprint was divided into overlapping blocks and the 
standard deviation value of each block was used to form 
the feature vector. 

In this paper, we propose a new technique to extract the 
features of palmprint based on fractal codes. This 
technique is different with the method in [4] and [5]. 



II. IMAGE ACQUISITION 

All of palm images are captured using Sony DSC P72 
digital camera with resolution of 640 x 480 pixels. Each 
persons was requested to put his/her left hand palm down 
on with a black background. There are some pegs on the 
board to control the hand oriented, translation, and 
stretching. A sample of the hand and pegs position on the 
black board is shown on Figure 1 (a). 



III. PALMPRINT EXTRACTION AND 
NORMALIZATION 

This paper used new technique to extract the ROI 
(region of interest) of palmprint. This technique consists of 
two steps in center of mass (centroid) method. These steps 
can be explained as follow. 

a. The gray level hand image is thresholded to obtain the 
binary hand image. The threshold value was computed 
automatically using the Otsu method. To avoid the 
white pixels (not pixel object) outside of the hand 
object is used median filter. 
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b. 



c. 



d. 



e. 



Each of the acquired hand images needs to be aligned 
in a preferred direction so as to capture the same 
features for matching. The moment orientation method 
is applied to the binary image to estimate the 
orientation of the hand. In the method, the angle of 
rotation ( ) is the difference between normal axis and 
major axis of ellipse that can be computed as follows. 



5 = -tan -1 

2 



2// u 



/^2,0 A),2 



v P , q =XZ( m_m ) /7 ( n ~ n )' 



(1) 



(2) 



where JU represent the (p,q) th moment central, and 
(m,n) represents center of area is defined as 



m = — y Ym- 
N 



n = — y Yaz' 

N 

* y m n 



(3) 



where N represents number of pixel object. 
Furthermore, the grayscale and the binary image are 
rotated about (o) degree. 

Bounding box operation is applied to the rotated 
binary image to get the smallest rectangle which 
contains the binary hand image. The original hand 
image, binarized image, and the bounded image 
shown in Figure 1 (a), (b), and (c), respectively. 
The centroid of bounded image is computed using 
equation (3) and based on this centroid, the bounded 
binary and original images are segmented with 200 x 
200 pixels. The segmented image and its centroid 
position are shown in Figure 1 (d) and (e). 
The centroid of the segmented binary image is 
computed and based on this centroid the ROI of 
grayscale palmprint image can be cropped with size 
128 x 128 pixels. The first and the second positions of 
centroid in binary and gray level image are shown in 
Figure 1 (f) and (g). 



This method is so simple. This method has been tested 
for 1050 palmprint images acquired from 210 persons, and 
the results show this method is reliable. 

Before the feature extraction phase, the extracted ROI 
are normalized using normalization method in [11] to 
reduce the possible imperfections in the image due to non- 
uniform illumination. The method is as below: 



I'(x,y) = 



d +A if I(x,y)><t> 



-X 



otherwise 



A = . 



p d {I{x,y)-<t>? 



(4) 



(5) 



P 



where / and /' represents original grayscale palmprint 
image and the normalized image respectively, </> and p 
represents mean and variance of the original image 
respectively, while </> d and p d are the desired values for 
mean and variance respectively. This research use </> d = 180 
and p d = 180 for all experiments. 
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Figure 1 . Extraction of palmprint, (a) original image, (b) 

binary image of (a), (c) object bounded, (d) and (e) 

position of the first centroid mass in segmented binary and 

gray level image, respectively, (f) and (g) position of the 

second centroid mass in segmented binary and gray level 

image, respectively. 



IV. FEATURES EXTRACTION 

There are three main steps to extract the palmprint 
features based on fractal codes proposed in this paper. 
These steps can be explained as follows. 

A. Extraction of fractal codes of palmprint images 

Fractal codes of palmprint images are obtained using 
the partitioned iterated function system (PIFS) method. In 
PIFS method, each image is partitioned into its range 
blocks and domain blocks. The size of the domain blocks 
is usually larger than the size of the range blocks. The 
relation between a pair of range block (R t ) and domain 
block (Pi) is noted as 



Ri=Wi(Di) 



(6) 



w t is contracted mapping that describes the similarity 
relation between /?, and Z)„ and is usually defined as an 
affine transformation as below: 
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where x x and y x represent top-left coordinate of the R t , and 
Zi is the brightness value of its block. Matrix elements a h 
hi, c h and d h are the parameters of spatial rotations and 
flips of D i9 Si is the contrast scaling and o x is the luminance 
offset. Vector elements ei and f are offset value of space. 
In this paper, we used the size of domain region twice the 
range size, so the values of a h b h c h and d t are 0.5. The 
actual fractal code pi below is usually used in practice[19]. 



ft = (( x a , y Di I ( x Ri > y Ri I si ^ e i > e t > s t > °t ) 



(8) 
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where \Xft ^RJ an d [%£) -> y ]) ) represent top-left 

coordinate position of the range block and domain block, 
respectively, and size is the size of range block. The fractal 
codes of a palmprint image is denoted as follow: 



F = \Jf> 



(9) 



i=\ 



where N represents the number of the fractal code. The 
inequality expression below is used to indicate whether the 
range and the relevant domain block are similar or not. 



d{R,D)<£, 



(10) 



where d(R,D) represents rmse value, and e is the threshold 
(tolerance) value. The range and the relevant domain block 
is similar if d(R,D) is less or equal than e. Otherwise, the 
block is regarded not similar. 

B. Palmprint features representation 

The first step of this method is the forming of angle 
image A as follows. 

A(j, k) = a t , j = 1,2,3,... M 19 k = 1,2,3,... M 2 (11) 

y D -y R 



<x = arctan 



if J = * R and k = y R > 



otherwise, a i = (12) 

where [x D , y D ) represent top-left coordinate of the 

domain block (see formula (8)) and d t represent the angle 
between range and domain block. The angle image is not 
binary image representation. The criterion below are added 
to compute the direction a { . 

tf x r < x d an d yR — yD then a i — a { 

if x R > x D and y R > y D then a i - 180-ar 

if x R > x D and y R < y D then a i - 180 + a i 

if x r < x d an d yR — yD then ot i — 360 - ot i 

if x R = x D and y R > y D then a i — 90 



if x R - x D and y R < y D then a i - 270 



(13) 



The criterion size i — vcim{size) means the palmprint 

features representation is formed practically using the 
coordinate of the smallest size range block. Later, the 
representation is filtered as follow. 

I'{x,y) = l{x,y)*h{x,y) mxn , (14) 

h(x,y) is filter which all of its component are one. Figure 
2(b) show the palmprint features image of Figure 2(a). 

C. Palmprint feature vector 

Palmprint feature vector (V) is obtained by dividing 
the palmprint image into 16 x 16 blocks, and for each 
block its mean value is computed, so obtained the feature 

vector V = \V l , V 2 . . . , V^ J , where N = 256, and v t is 

mean value of block /. 




(c) (d) 

Figure 2. Palmprint feature extraction, (a) original image, 
(b) Image /, (c) Image /', (d) block feature representation 

The Figure 2 (d) show the palmprint feature representation 
in 16x16 sub blocks. Figure 3 shows example of three 
groups of palmprints from the same palm and palms with 
similar/different line structures. The features of these 
palmprints are plotted in figure 4. The results show that the 
features of three palm images from the same person are 
close to each other than the features of three palm images 
from the different persons with similar or different line 
structures. 



V. PALMPRINT FEATURE MATCHING 

The degree of similarity between two palmprint 
features is computed as follows: 

d =1 (gr ~ * r Xg, ~ *s f ( 15 ) 

[(x r - x r )(x r - x r J f 2 [(x s - x s )(x s - x s J \ 

where X r , X s are the mean of palmprint feature x r and x s , 

respectively. The above equation computes one minus 
normalized correlation between palmprint feature vector x r 

and x s . The values of d rs are between 0-2. The d rs will 

be close to if x r and x s obtained from two image of the 

same palmprint. Otherwise, the d rs will be far from 0. 

Figure 4 shows comparison of feature component of 
those palmprint shown in figure 3, and their score are listed 
in Table 1. The matching score of group A are close to 0, 
and the matching score of group B and C are far from 0. 
The average score of group A, B, and C are 0.1762, 
0.5057, and 0.6452, respectively. It is easy to distinguish 
group A from group B and C using these scores. 




(al) (a2) (a3) 

Group A: palmprints from the same person 



49 



http://sites.google.com/site/ijcsis/ 
ISSN 1947-5500 



(IJCSIS) International Journal of Computer Science and Information Security, 
Vol.9, No. 2, February 2011 




Group B : palmprints from different person with similar line 
structure 





WW « * « Tfl 

Feature component, number 
(a) 



(cl) (c2) (c3) 

Group 3: palmprints from different person with different line 

structure 

Figure 3. Example of three groups of palmprint 

Table 1 Matching Score of groups A, B, and C in figure 3 





al 


a2 


a3 


Average 


al 





0.1957 


0.1404 


0,1762 


a2 


0.1957 





0.1925 


a3 


0.1404 


0.1925 







bl 


b2 


b3 


Average 


bl 





0.5352 


0.3056 


0,5057 


b2 


0.5352 





0.6763 


b3 


0.3056 


0.6763 







cl 


c2 


c3 


Average 


cl 





0.6900 


0.6177 


0,6452 


c2 


0.6900 





0.6280 


c3 


0.6177 


0.6280 







Feature coiuponeiftHutttber 
(b) 




VI. EXPERIMENTS AND RESULTS 

We collected palm image from 210 persons from both 
sexes and different ages, 5 samples from each person, so 
our database contains 1050 images. The resolution of hand 
image is 640 x 480 pixels. The palmprint images, of size 
128 x 128 pixels, were automatically extracted from hand 
image as described in the Section 3. The averages of the 
first three images from each user were used for training 
and the rest were used for testing. 

The performances of the verification system are 
obtained by matching each of testing palmprint images 
with all of the training palmprint images in the database. A 
matching is noted as a correct matching if the two 
palmprint images are from the same palm and as incorrect 
if otherwise. 



1Q M9 X -il 

Feature cuiiipunetiii'LUUjljer 
(c) 

Figure 4. Comparison of feature component of the 

palmprint group shown in figure 2. (a),(b),(c) are feature 

component of group A, B, and C, respectively. Red, green, 

blue color are the first, second, and third palmprint in each 

group, respectively. 
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Figure 5. Distribution of three feature components 
of 1050 palmprints in feature space 
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Figure 6. Performance of verification system, (a) genuine 

and imposter distribution, (b) FAR/FRR/EER with various 

threshold 

Table 2. FRR/FAR with various threshold value 



Threshold 


FRR 


FAR 


0.4386 


2.0734 


0.4734 


0.4586 


1.9139 


0.5158 


0.4626 


1.7544 


0.6998 


0.4746 


1.4354 


0.9160 


0.4786 


1.2759 


1.3552 


0.4986 


1.1164 


2.1480 


0.5386 


1.1164 


2.2881 



Figure 6 (a) shows the probability distributions of a 
genuine and imposter parts with tolerance value = 3, and 
feature vector length = 256(16x16 blocks). The genuine 
and imposter parts are estimated from correct and incorrect 
matching scores, respectively. The result with various 
threshold and false acceptance rates (FAR)/false rejection 
rates (FRR) are shown in figure 6 (b). The equal error rate 
(EER) of the verification system is 1.2758. Table 2 show 
the performance (FAR/FRR) system with some threshold 
values. 

The main advantage by using PIFS code in this paper 
is both palmprint feature and palmprint image can be 
obtained directly from compressed domain (fractal code). 



VII. CONCLUSIONS AND FUTURE WORK 

In this paper, we introduced a fractal 
characteristics based feature extraction and representation 



method for palmprint verification. The experiment results 
show that the proposed method can achieve an acceptable 
accuracy rate with FRR = 1.7544, and FAR- 06998. In the 
future, we will combine the proposed method with wavelet 
transformation to extract the feature of palmprint to retain 
the block operation. 
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Abstract — Breast cancer is one of the major causes of fatality 
among women aged above 40. Digital mammography is used by 
radiologists for analysis and interpretation of cancer. Visual 
reading and interpretation of mammograms is a very demanding 
and expensive job. Even well-trained experts may have an 
interobserve variation rate of 65-75 percent. Extraction of the 
breast contour and pectoral muscle segmentation is necessary in 
order to limit the search for abnormalities by Computer Aided 
Diagnosis (CAD). A new technique for breast border extraction 
and pectoral muscle segmentation is explored in this paper. The 
technique is applied to 250 MIAS mammograms. This method 
has given about 98% in segmenting the pectoral muscle. 

Keywords -Image Processing, mammography, morphology, filter, 
edge detection. 



I. INTRODUCTION 

One of the leading causes of death among women is the 
breast cancer. Early diagnosis and subsequent treatment can 
significantly improve the chance of survival for patients with 
breast cancer. Most effective method for the detection of early 
breast cancer is mammography. Mammograms are among the 
most difficult radiological images to interpret by radiologists. 
Studies have shown that radiologists do not detect all breast 
cancers that are retrospectively detected on the mammograms. 
Detection is the ability to identify potential abnormalities, 
such as microcalcification, masses, and architectural 
distortions. Diagnosis is the ability to characterize or classify 
a detected abnormal entity as being either benign or malignant. 
However, before CADe algorithms can perform their task of 
identifying suspicious regions in a mammogram, a series of 
pre-processing steps must be taken. These include: 
mammogram orientation, label and artifact removal, 
mammogram enhancement, breast contour detection and 
pectoral muscle segmentation 

Many computer algorithms [1, 2, 3] have been proposed 
for automating various aspects of detecting the presence of 
cancer in mammograms. While detection rates for automatic 



systems are quite high, the false positive detection rates are 
also high. Accordingly, work continues on improving all 
aspects of computer-aided detection (CAD) for 
mammography. Implementation of breast border detection, 
because of some factors such as the low contrast near the 
borders, image noise and artifacts is complicated. 

In mammogram, image processing [27-31] and computer- 
aided diagnosis of breast cancer breast segmentation is an 
important pre-processing step. The accuracy and efficiency of 
processing algorithms will be increased if the processing is 
limited to a specific target region in an image. 

Extracting the pectoral muscle [23, 24, 25] is particularly 
important in automated mammogram image assessment. 
Segmentation of the pectoral muscle is a non-trivial, complex 
and demanding task. It is also complicated further by a 
number of factors. Foremost thing is, the muscle edge is not a 
straight line, but can be convex, concave or a mixture of both. 
Secondly muscle edge though may appear to be visually 
continuous; the edge exhibits variations in texture and 
sharpness. This paper describes a new technique for extracting 
the breast border and segmenting the pectoral muscle of digital 
mammograms. 

The remainder of this paper is organized as follows. In 
Section 2, the approaches to extraction of breast border and 
segmentation of pectoral muscle are described. The theory and 
proposed techniques are presented in Section 3. Experimental 
results are given and discussed in Section 4. Finally, the paper 
is summarized in Section 5. 

II. PREVIOUS APPROACHES TO BREAST BORDER 

EXTRACTION AND PECTORAL MUSCLE 

SEGMENTATION 

There have been various approaches to the task of 
isolating the breast region. 
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M. Wirth et al. developed an algorithm [1] that uses 
morphological preprocessing and fuzzy rule-based algorithm 
for breast region extraction. Kostas Marias et al. [2] used the 
boundary extraction technique based on a combination of the 
Hough transform followed by image gradient operators and 
morphology in order to make coherent the breast region part of 
the image. Histogram equalization and thresholding process 
are employed by Barba J. Leiner et al. [3] to extract only the 
region of the image that corresponds to the breast. 
Segmentation of the breast region in mammograms has 
traditionally been achieved using methods besides active 
contours [4]. Semmlow et al. [5] used a spatial filter and Sobel 
edge detector to locate the breast boundary on 
xeromammograms. Global thresholding has been used in 
many cases to segment the breast region from the background 
[6-7]. The major problem with using global thresholding is the 
nonuniform background region, although efforts, such as that 
of Masek et al. [8] using local thresholding have shown more 
promise. 

A system of masking images with different thresholds to 
find the breast edge is developed by Abdel-Mottaleb et al. [9]. 
Gradient based method is proposed by Mendez et al. [10] to 
find the breast contour. They used a two level thresholding 
technique to isolate the breast region of the mammogram. The 
smoothed mammogram is divided into three regions and then 
a tracking algorithm is applied to the mammogram to detect 
the border. Bick et al. [11] proposed a global segmentation 
approach that incorporates aspects of thresholding, region 
growing and morphological filtering. Lou et al. [12] proposed 
a method based on the assumption that the trace of intensity 
values from the breast region to the air-background is a 
monotonic decreasing function. 

One of the inherent limitations of these methods is the 
fact that very few of them preserve the skin or nipple. The 
most promising method of extracting the breast contour 
focuses on modeling the non-breast region of a mammogram 
using a polynomial method, as described by Chandrasekhar 
and Attikiouzel [13, 14]. 

Maysam Shahedi et al. proposed a new algorithm [15] for 
automatic breast border detection in digital mammograms 
based on local adaptive thresholding method. Roshan 
Dharshana Yapa et.al. presented a new algorithm [16] for 
estimating skin-line and breast segmentation using fast 
marching algorithm. They introduced some modifications to 
the traditional fast marching method, specifically to improve 
the accuracy of skin-line estimation and breast tissue 
segmentation. 

The method proposed in [17] initially determines 
intensity value of the background to be able to find pixels that 
create the border line. Then breast centre has been taken as 
the starting point for a simple region growing algorithm. H. 
Mirzaalian et al. proposed an algorithm [18] based on 
polynomial modeling to detect breast contour. Two methods 



[19] are implemented on a number of mammogram images by 
Ayman et.al. The segmentation outputs of these methods were 
very efficient and excellent. Method proposed in [20] applies 
the meta-heuristic methods such as Ant Colony Optimization 
(ACO) and Genetic Algorithm (GA) for identification of 
suspicious region in mammograms. 

There have been various approaches to the task of 
segmenting the pectoral muscle. 

A histogram-based thresholding technique is used by K. 
Thangavel and M. Karnan [23] to separate the pectoral muscle 
region. For selecting the threshold value the global optimum 
is considered. The intensity values smaller than global 
optimum threshold are changed to zero, and the gray values 
greater than the threshold are changed to one. To better 
preserve the pectoral muscle region erosion and dilation 
operations are applied. To segment the pectoral muscle region 
the gray level mammogram image is converted to binary 
image. The white pixels in the lower left corner of the 
mammogram image indicate the pectoral muscle region. 

Kwork et al. [24] developed a method for automatic 
pectoral muscle segmentation on mammograms by straight 
line estimation and cliff detection. A straight line estimates the 
muscle edge and cliff detection refines the detected edge by 
surface smoothing and edge detection in a restricted 
neighborhood. 

H. Mirzaalian et al. developed [25] a new method for the 
identification of the pectoral muscle in MLO mammograms. 
The developed method is based on nonlinear diffusion 
algorithm. They compared their results by those recognized by 
two expert radiologists. To evaluate the accuracy of proposed 
method, HDM (Hausdorff Distance Measure) and MAEDM 
(Mean of Absolute Error Distance Measure) were used. 

R.J. Ferrari proposed [26] a new method for the 
identification of the pectoral muscle in MLO mammograms 
based upon a multiresolution technique using Gabor wavelets. 
This new method overcomes the limitation of the straight-line 
representation considered in their initial investigation. The 
results of the Gabor-filter-based method indicated low 
Hausdorff distances with respect to the hand-drawn pectoral 
muscle edges. 

Mario Mustra et al. [17] uses wavelet decomposition, 
image blurring and edge detection using the Sobel filter for 
breast border detection and pectoral muscle segmentation. N. 
Nicolau et al. [34] proposed the use of Independent 
Component Analysis (ICA) for identification and subsequent 
removal of the pectoral muscle. 

III. PROPOSED BREAST BORDER EXTRACTION AND 
PECTORAL MUSCLE SEGMENTATION TECHNIQUE 
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The block diagram for pectoral muscle segmentation is 
shown in Fig. 1. Short description of each block is given. 



Mammogram input 

L_ 



Breast Border Detection 



Locate the Region Containing the Pectoral Muscle 



Wavelet Decomposition 



Mammogram with Pectoral Muscle Segmentation 



Figure 1: Steps carried out for pectoral muscle segmentation. 

3.1 Breast Border Detection 

We explored a new technique for breast region 
segmentation using morphological and filtering techniques. 
The steps followed to detect the breast border involves: - 
Removal of noise by median filter, Artifacts removal by 
morphological operation, Edge detection using Sobel method, 
filtering, finding the perimeter of the binarized image and thus 
detect the breast border. 

Removal of Noise 

Median filter is used to remove the noise. It is the 
nonlinear filter used to remove the impulsive noise from an 
image. Median filter is a spatial filtering operation. In the 
proposed median filter output pixel contains the median value 
in the 3X3 neighborhood around the corresponding pixel in 
the input image. 

Artifacts Removal 

The original mammogram is opened by using a suitable 
structuring element. After the opening of mammogram it is 
reconstructed. Next step is to threshold the difference image 
with 102, which is experimentally obtained. Finally 
morphological operators are applied to smooth irregularities 
and expand region. Fig. 2 shows the results of these steps on 
MIAS image mdb003. 





Figure 2: Results for MIAS image mdt>003 (a). Original image; (b). Artifacts 
removed in the mdb003 

Edge Detection and Filtering Techniques 

This step uses the Sobel edge detector followed by 
dithering and 2-D order statistic filtering. The Sobel method 
finds edges using the Sobel approximation to the derivative. 
Edge detection is followed by dithering. A logical OR 
operation is done on dithered and edge detected image. A 2-D 
order static filtering is applied on the image obtained as a 
result of the previous steps. The result for mdb003 is shown in 
Fig. 3 after applying these steps. 





Figure 3: Results for MIAS image mdb003 (a). Edge detection; (b). Dithering 
; (c). 2-D statistic filtering 

Multidimensional image filtering 

This step removes the noises using a multidimensional 
image filtering. A rotationally symmetric Gaussian low pass 
filter filters the image. After that the image is converted to 
binary image and erosion is carried out. Fig. 4 shows the 
results for MIAS image mdb003 after applying these steps. 





Figure 4: Results for MIAS image mdb003 

Find perimeter pixels in binary image and superimpose on the 
original image 

Finally the perimeter pixels in binary image are found. 
This perimeter is the boundary of the breast image. Fig. 5 
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shows the results. A pixel is the part of the perimeter if it is 
nonzero and it is connected to at least one zero-valued pixel. 
The connectivity used is 8. 




Figure 5: Contour superimposed on original image mdt>003. 

3.2 Locate the region containing the pectoral muscle 

Pectoral muscle detection is a challenging task in the 
breast segmentation process. The algorithm for pectoral 
muscle segmentation proposed in this paper consists of few 
steps. Technique for segmenting pectoral muscle presented in 
this paper uses wavelet decomposition, and edge detection 
using the Canny filter. 

The region of interest containing pectoral muscle is 
determined by two steps. First a rectangle which encloses the 
pectoral muscle is determined and then a refinement/reduction 
to this rectangle is done so that the processing time for 
pectoral muscle segmentation can be still reduced. The initial 
rectangle is formed by three points A B and C. For example, if 
the image shows MLO view of the right breast, the first point 
A is top left corner of the image with coordinates (1,1). The 
second point B is determined by the contour of skin-air 
interface. The third point C is chosen to be approximately at 
half of image height. By those three points a rectangle is 
determined. Fig. 7 shows the breast contour superimposed on 
the image mdb016 and the rectangle ABCD determined. 



Now a line FG is drawn parallel to the line BD through E. It 
can be seen that for all the 250 images the reduced rectangle 
AFGD still include the pectoral muscle. Fig. 8 shows this 
result for mdb016. 




Figure 8: The reduced area that containing the pectoral muscle region is 
enclosed in AFGD. 



3.3 Wavelet decomposition 

Wavelet decomposition of fourth level is being done. 
Fourth level wavelet decomposition gives the best results for 
detecting larger structures, such as pectoral muscle. The fourth 
level decomposition gives the best results because it preserves 
enough rough details while at the same time remove fine 
details like noise and granulation. In this paper, a Daubechies 
filter has been used. Daubechies wavelets are a family of 
orthogonal wavelets defining a discrete wavelet transform and 
characterized by a maximal number of vanishing moments for 
some given support. With each wavelet type of this class, there 
is a scaling function which generates an orthogonal 
multiresolution analysis. Fig 9 shows a Daubechies 20 2-d 
wavelet. 




Figure 7: Breast contour superimposed on the image mdb016 and the 
rectangle ABCD determined. 

The reason to reduce the size of the rectangle is to reduce 
the processing time for pectoral muscle segmentation and is 
done in the following way. A new point E is determined on the 
breast contour in such a way that point E on the breast contour 
has a maximum distance from the line BD towards point A. 




Figure 9 : Daubechies 20 2-d wavelet 

After the wavelet decomposition edges that were detected 
by the Canny filter inside the pectoral muscle region are 
removed by approximating muscle boundary with a straight 
line that connects upper right corner and lower left corner of 
muscle region in the case of the right breast image. 
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Some of the results of the proposed method for pectoral 
muscle identification is explained below. Fig. 12 shows the 
successful results of the proposed method. 



The proposed method applied to 250 mammograms from 
Mammography Image Analysis Society (MIAS) database [21]. 
The various results obtained are discussed below. Evaluation 
of breast contour detected in the mammograms was performed 
by the Hausdorff Distance Measure (HDM) [22] and also the 
Mean of Absolute Error Distance Measure (MAEDM). 
Evaluation is based on a distance transforms and image 
algebra between the edges identified by radiologists and by 
proposed method. The accuracy of contour detection is 99.06. 

Some of the results of the proposed method for breast 
contour extraction are explained below. Fig. 10 shows the 
successful results of the proposed method. Fig. 11 shows the 
failure case. 




(b) 



(c) 



Figure 10: Mammogram segmentation results for MIAS image mdb016. (a). 

Original Mammogram; (b). Noise & Artifacts removal after filtering and 

morphological operation, (c). Binary Image; (d). Contour superimposed on 

original. 



(a) 



(b) 



(c) 



Figure 11: Mammogram segmentation results for MIAS mdb012. (a). Original 

Mammogram; (b). Image after removal of artifacts; (c) Contour 

superimposed on original image. 




(a) 



(b) 



(c) 




(d) 



(e) 



Figure 12: Pectoral muscle identification results for MIAS image mdb016. 
(a).Breast contour superimposed on original image; (b). The region of interest 
that contain the pectoral muscle; (c). Segmented area that contain the pectoral 
muscle; (d). Wavelet decomposed image; (e). Pectoral muscle edge identified 

on image. 



V. CONCLUSION. 

In this paper a method for the detection of the breast 
contour and pectoral muscle segmentation is presented. The 
proposed method for detecting the breast border contour is 
tested on the 250 MIAS datasets. This method gave 99.06 
successes in detecting the correct skin-air interface. The 
proposed method fails in detecting the correct skin-air 
interface for very few mammograms because of the noise (big 
size artifacts). Advantage of this method is low algorithm 
complexity and therefore short processing time. Our further 
development concerns smoothing of the breast border and 
pectoral muscle segmentation line. The proposed technique is 
fully autonomous, and is able to preserve the skin and nipple. 

Pectoral muscle detection is a challenging task because it 
is not very well differenced from the surrounding breast tissue. 
There is different intensity variation of the pectoral muscle 
and the surrounding tissue for each mammogram images. The 
method proposed in this paper uses wavelet decomposition. 
This approach works well with an accuracy of 98% because 
pectoral muscle is rather large object for detection. Future 
work will focus on smoothening the breast contour and 
pectoral muscle edge. 
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Abstract — This paper presents improved content based image 
retrieval (CBIR) techniques based on multilevel Block 
Truncation Coding (BTC) using multiple threshold values. Block 
Truncation Coding based feature is one of the CBIR methods 
proposed using shape features of image. The shape averaging 
methods used here are BTC Level - 1, BTC Level - 2, BTC Level 
- 3 and BTC Level - 4. Here the feature vector size per image is 
greatly reduced by using mean of each plane and finding out the 
threshold value. Then divide each plane using the threshold 
value. In order to find out the performance of the algorithm, 
shape averaging is applied to calculate precision and recall 
values. Instead of using all pixel data of image as feature vector 
for image retrieval these six, twelve, twenty - four and forty - 
eight feature vectors for BTC Level - 1, Level - 2, Level - 3 and 
Level - 4 respectively, can be used. This results in better 
performance. The proposed CBIR techniques are tested on 
generic image database having 1000 images spread across 11 
categories. For each proposed CBIR technique 55 queries (5 per 
category) are fired on the generic image database To compare the 
performance of image retrieval techniques average precision and 
recall are computed of all queries. The results have shown the 
performance improvement (higher precision and recall values) 
with proposed methods compared to BTC Level-1. 

Keywords- Content Based Image Retrieval (CBIR), BTC Level-1, 
BTC Level-2, BTC Level-3, BTC Level - 4. 



the image databases. The similarity used for search criteria 
could be meta tags, color distribution in images and 
region/shape attributes. Most traditional methods of image 
retrieval utilize some method of adding metadata such as 
captioning, keywords, or descriptions to the images so that 
retrieval can be performed over the annotation words[23]. The 
limitations of text-based approach are that it is subject to 
human perception and the problem of annotation of images. 
Annotating every image is a cumbersome and expensive task. 

B. Content-based image retrieval 

Content-based image retrieval (CBIR) is the application of 
computer vision to the image retrieval problem, that is, the 
problem of searching for digital images in large databases. The 
term 'content' in this context might refer to color, shapes and 
textures. The color aspect can be achieved by the techniques 
averaging and histograms [4, 5, 7]. The texture aspect can be 
achieved by using transforms [12] or vector quantization [9, 
11, 15]. Finally the shape aspect can be achieved by using 
gradient operators or morphological operators. Some of the 
major areas of application are: Art collections, Medical 
diagnosis, Crime prevention, the military, Intellectual 
property, Architectural and engineering design and 
Geographical information and remote sensing systems. 



I. 



Introduction 



II. 



EDGE EXTRACTION 



Information retrieval (IR) is the science of searching for 
documents, for information within documents, and for 
metadata about documents, as well as that of searching 
relational databases and the World Wide Web. There is overlap 
in the usage of the terms data retrieval, document retrieval, 
information retrieval, and text retrieval, but each also has its 
own body of literature, theory and technologies. IR is 
interdisciplinary, based on computer science, mathematics, 
cognitive psychology, linguistics, statistics, and physics. 
Automated information retrieval systems are used to reduce 
what has been called "information overload". Many universities 
and public libraries use IR systems to provide access to books 
and journals. Web search engines are the most visible IR 
applications. Images do have giant share in this information 
being stored and retrieved. 

A. Image Retrieval 

Image search is a specialized data search used to find 
images. User may give a keyword, sketch or an image to image 
search engine for retrieving the relatively similar images from 



Edge detection is very important in image analysis. The 
edges give idea about the shapes of objects present in the 
image. Hence they are useful for segmentation, registration, 
and identification of objects in a scene. The problem with 
edge extraction using gradient operators is that detection of 
edges is either in horizontal or in vertical directions, as the 
gradient operators take only the first order derivative of image. 
Shape feature extraction in image retrieval requires the 
extracted edges to be connected in order to reflect the 
boundaries of objects present in the image. Slope magnitude 
method[l] is used along with the gradient operators (Sobel, 
Prewitt, Robert and Canny)[l] to extract the shape features in 
form of connected boundaries. The process of applying the 
slope magnitude method is given as follows. First the image 
needs to be convolved with the Gx mask to get the x gradient 
and Gy mask to get the y gradient of the image. Then the 
individual squares of both these gradients are taken. Square 
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root of addition of the two squared terms gives the extracted 



connected edges from the image as given in equation 1 . 
G = \Gl + G$ 



(i) 



III. BLOCK TRUNCATION CODING 

Block truncation coding (BTC) is a simple image coding 
technique developed in the early years of digital imaging. BTC 
has played an important role in the history of digital image 
coding in the sense that many advanced coding techniques 
have been developed based on BTC or inspired by the success 
of BTC. 

This method first divides the image to be coded into small 
non-overlapping image blocks typically of size 4x 4 pixels to 
achieve reasonable quality. The small blocks are coded one at 
a time. For each block, the original pixels within the block are 
coded using a binary bit-map the same Upper Mean Color 
(UM) size as the original blocks and two mean pixel values. 
The method first computes the mean pixel value of the whole 
block and then each pixel in that block is compared to the 
block mean. If a pixel is greater than or equal to the block 
mean, the corresponding pixel position of the bitmap will have 
a value of 1 otherwise it will have a value of 0. Two mean 
pixel values one for the pixels greater than or equal to the 
block mean and the other for the pixels smaller than the block 
mean are also calculated. At decoding stage, the small blocks 
are decoded one at a time. For each block, the pixel positions 
where the corresponding bitmap has a value of 1 is replaced 
by one mean pixel value and those pixel positions where the 
corresponding bitmap has a value of is replaced by another 
mean pixel value. 

It was quite natural to extend BTC to multi - spectrum 
images such as color images. Most color images are recorded 
in RGB space, which is perhaps the most well-known color 
space. As described previously, BTC divides the image to be 
coded into small blocks and code them one at a time. For 
single bitmap BTC of color image, a single binary bitmap the 
same size as the block is created and two colors are computed 
to approximate the pixels within the block. To create a binary 
bitmap in the RGB space, an inter band average image (IBAI) 
is first created and a single scalar value is found as the 
threshold value. The bitmap is then created by comparing the 
pixels in the IBAI with the threshold value. 

A. Bit Calculation 

Let X={R(i,j),G(ij),B(i,j)} where i=l,2,....m and 
j=l,2,....,n; be an mxn color image in RGB space. The 
interband average image could be computed as IA={IB(i,j) } 
where i=l,2,— ,m and j= 1,2, ,n and where 

m,n = I kw,j) + cay) + stum (2) 

The Threshold(T) is computed as the mean of IB(iJ). 



T = 



m n 
1 V-V- 



mXn 



LL IB(i ' j) 



(3) 



-1 7 = 1 



The Binary bitmap {BM(i,j)} with i=l,2,...,m and 
j=l,2,...,n is computed as 



RM(in-[ 1,ifIB( - i,j) - 1 
BMil ' j) -kifIB(i,j)<0 



(4) 



B. Upper mean and Lower mean calculation 

After the creation of the bitmap, two representative (mean) 
colors are then computed. The two mean colors, Upper Mean 
and Lower Mean. The Upper Mean UM=(Rml, Gml, Bml) is 
computed as following equations. 



iiml — 



m n 



J 1 = 1 7=1 



m n 



ESiE' 



umT) [ lL Bm ' j)XG ^ 

J i=1 ;' = 1 



i=l 7=1 



m n 



(5) 



(6) 



(7) 



The Lower Mean LM= (Rm2, Gm2, Bm2) is computed as 
following equations: 



Rrn? — ' 



m n 



Gm? — - 



Bm? — - 



m n 
m n 



(8) 

(9) 
(10) 



Now these Upper Mean and Lower Mean together will form a 
feature vector or signature of the image. For every image 
stored in the database these feature vectors are computed and 
stored in feature vector table. Whenever a query image is 
given to CBIR, again the feature vector for query image will 
be computed and then it will be matched with feature vector 
table entries for best possible matches at given accuracy rate. 
Here we have used Direct Euclidean Distance as a similarity 
measure to compute the similarity measures of images for 
Content Based Image Retrieval applications. 

IV. MULTILEVEL BTC 

Image As seen above in section 2.4, the image data is divided 
into 6 parts using the 3 means calculated for each of the planes 
(R, G and B). This is called BTC - Level 1. Similarly, if the 
image data is divided into 12 parts using the 6 means 
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calculated of each of the 6 parts in Level 1, we obtain BTC 
Level2data[21]. 

Here the bitmap are prepared using upper and lower mean 
values of individual colour components. For Red colour 
component, the bitmap "BMUR" and "BMLR" are generated 
as given in equations 17 and 18. Similarly for Green colour 
component "BMUG" & "BMLR" and for Blue colour 
components "BMUB" & "BMLB" can be generated. 



™»«™--il%lki 



l,ifR(i,j) > UR 
j) < UR 



«™-g»*" 



(ID 

(12) 



,j) < LR 



Using this bitmap the two mean colours per bitmap, one for 
the pixels greater than or equal to the threshold and the other 
for the pixels smaller than the threshold are calculated. The 
upper mean color UM (UUR, ULR, UUG, ULG, UUB, ULB) 
are given as follows. 

m n 



UUR = 



ULR ■■ 



^=i^U BMLR ^Jr^ijri 



[> > BMLR(i,jyir(i,j)] 



(14) 



And the first two components of Lower Mean LM= (LUR, 
LLR, LUG, LLG, LUB, LLB) are computed using following 
equations. 



LUR 



LLR -- 



: ir =1 ^^a;o gg {1 - gM ^^ )} * Iur ^ (is) 



2Z {1 " BMLR &ffl * Ilr ^'J^ (16) 
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Even in Mask shape BTC based image retrieval four variations 
are considered using different gradient operators. 

VI. IMPLEMENTATION 

The discussed image retrieval methods are implemented using 
MATLAB 7.0 on Intel Core 2 Duo processor T8 100(2.1 GHz) 
with 2 GB of RAM. To check the performance of proposed 
technique a database of 1000 variable sized images spread 
across 11 categories has been used[3]. Five queries were 
selected from each category of images. Mean Squared Error 
(MSE) is used as similarity measure for comparing the query 
image with all the images in the image database. Let Vpi and 
Vqi be the feature vectors of image P and Query image Q 
respectively with size n, then the MSE can be given as shown 
in equation 17. 



MSE = yVpi - Vqi) 2 



(17) 



To assess the retrieval effectiveness, we have used the 
precision and recall as statistical comparison parameters for 
our proposed technique of CBIR. The standard definitions of 
these two measures are given by following equations. 



Precision = 



Re call - 



lT=i^ =1 BMLR{i.j) 



Number _of _ relevant _ images _ retrieved 

Total _ number _of _ images _ retriev ed \ 1 ° / 

Number _of _ relevant _ images _ retrieved 

Total _ number _of _ relevent _ images _ in _ database ( 1 y) 



VII. RESULTS AND DISCUSSION 



These Upper Mean and Lower Mean together will form a 
feature vector for BTC - Level 2. For every image stored in 
the database these feature vectors are computed and stored in 
feature vector table. 

Similarly the feature vector for BTC - Level 3 can be found 
by extending the BTC - Level 2 till as shown in figure 20. 
Hence the image is divided into 24 parts using 12 means 
generated from Level 2. Each plane will give the 6 elements of 
feature vector. For example for the Red plane we get ( UUUR, 
LUUR, ULUR, LLUR, UULR, LULR, ULLR, LLLR ). 



V. PROPSED CBIR TECHNIQUES 

The problem of having all the database images with same 
size for image retrieval can be resolved using proposed Mask 
Shape BTC based CBIR methods. Here firstly, the shape 
features of the image are extracted by applying slope 
magnitude method on gradients of the image in vertical and 
horizontal directions and then the BTC is applied on obtained 
Mask Shape images to have a shape feature vector with 
constant size irrespective of size of the image considered. 
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Figure 1 : Crossover points for all levels of BTC for Canny Operator 

Figure 1 shows a comparison between all the four levels of 
BTC by applying Canny operator. To get a better 
understanding of the results figure 2 shows a zoomed version 
of the same graph. From figure 2 we can see that level 3 gives 
the best performance in comparison to the other levels. But we 
see a drop in performance for level 4 due to the formation of 
null sets. Figure 3 shows a bar graph comparing the results of 



62 



http://sites.google.com/site/ijcsis/ 
ISSN 1947-5500 



(IJCSIS) International Journal of Computer Science and Information Security, 

Vol. 09, No. 02, 2011 



all four levels of BTC for the Canny Operator. The same 
performance is given by the other Gradient Operators as well. 
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Figure 2: Zoomed version of all levels of BTC for Canny Operator 
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Figure 3: Comparison between all levels of BTC for Canny Operator 

The performance of all the operators with all the four levels of 
BTC has been shown in figures 4a and 4b. Figure 4a shows 
comparison between all Gradient Operators with respect to 
BTC levels and figure 4b shows comparison between all BTC 
levels with respect to Gradient Operators. 
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Figure 4a: Comparison between all operators based on BTC Levels 




Figure 4b: Comparison between all BTC levels based on Gradient Operators 

The performance of all the operators with all the four levels 
of BTC has been shown in figures 4a and 4b. Figure 4a shows 
comparison between all Gradient Operators with respect to 
BTC levels and figure 4b shows comparison between all BTC 
levels with respect to Gradient Operators. 

VIII. CONCLUSION 

From the experimental analysis and results, it is evident that 
out of the four Gradient Operators, Canny Gradient Operator 
gives best performance in proposed shape based image 
retrieval techniques using BTC level 2 and BTC level 3. 
Robert Gradient Operator gives best performance for BTC 
level 3 and BTC level 4. Sobel and Prewitt Gradient Operators 
give an average performance for all 4 levels of BTC based 
CBIR methods. The BTC level 3 gives best performance for 
all Gradient Operators based CBIR as compared to other 
levels of BTC, with BTC level 4 showing the lowest 
performance.. 
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Abstract — When different type of packets with different needs 
of Quality of Service (QoS) requirements share the same network 
resources, it became important to use queue management and 
scheduling schemes in order to maintain perceived quality at the 
end users at an acceptable level. Many schemes have been studied 
in the literature, these schemes use time priority (to maintain 
QoS for Real Time (RT) packets) and/or space priority (to 
maintain QoS for Non Real Time (NRT) packets). In this paper, 
we study and show the drawback of a combined time and space 
priority (TSP) scheme used to manage QoS for RT and NRT 
packets intended for an end user in High Speed Downlink Packet 
Access (HSDPA) cell, and we propose an enhanced scheme 
(Enhanced Basic-TSP scheme) to improve QoS relatively to the 
RT packets, and to exploit efficiently the network resources. A 
mathematical model for the EB-TSP scheme is done, and 
numerical results show the positive impact of this scheme. 

Keywords: HSDPA; QoS; Queuing; Scheduling; RT and NRT 
packets; Markov Chain. 



I. 



Introduction 



In recent years, the performance of mobile cellular 
telecommunication networks have been growing continuously 
by increasing the hardware capacity, and new generation of 
mobile networks offer more bandwidth resources. With this 
development, new services with high bandwidth demand and 
different QoS requirements have been incorporated and its 
effect needs to be taken in consideration. 
Despite of the efforts taken on the infrastructures to improve 
network services, the disturbing impact of the wireless 
transmission may lead to a degradation of the perceived 
quality at the end users. It becomes important to take 
additional measures on the networks. 

Hence, two ways are possible. The first is to adapt the 
contenent to the current network conditions at the end user. 
This is the end to end QoS control [15]. The most well known 



mechanisms to achieve this adaptation are Random Early 
Detection (RED) [8] and its variants [7]. The second way is to 
manage network resources to offer network support for 
content; it is a network centric approach. One of the most 
important representatives of this second way is queue 
management and packet scheduling which have impact on the 
QoS attributes. When different type of packets with different 
needs of QoS standards share the same network resources, 
such as buffers and bandwidth, a priority scheme from the 
second way has to be used. The priority scheme can be defined 
in terms of a policy determining [13]: 

• Which of the arriving packets are admitted to the 
buffer and how it is admitted 

And/or 

• Which of the admitted packets is served next 

The former priority service schemes referred to as space 
priority schemes and attempt to minimize the packet loss of 
non real time (NRT) applications (www browsing, e-mail, ftp, 
or data access) for which the loss ratio is the restrictive 
quantity. The latter priority service schemes are referred as 
time priority schemes and attempt to guarantee acceptable 
delay boundaries to real time (RT) applications (voice or 
video) for which it is important that delay is bounded. 
Many priority schemes have been studied in literature, and 
have focused on space priority or time priority. 
Authors in [14] present a modeling for a multimedia traffic in 
a shared channel, but they take in consideration system details 
rather the characteristics of the flows composing the traffic. 
Works in [1], [4], [12] study priority schemes and try to 
maximize the QoS level for the RT packets, without taking 
into account the effect on degradation of the QoS for NRT 
packets. 

In HSDPA (High-Speed Downlink Packet Access) 
technology, it is possible to implement Packet scheduling 
algorithms that support multimedia traffic with diverse 
concurrent classes of flows being transmitted to the same end 
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user [9]. Therefore, Suleiman and all present in [16] a queuing 
model for multimedia traffic over HSDPA channel using a 
combined time priority and space priority (TSP priority) with 
threshold to control QoS measures of the both RT and NRT 
packets. 

The basic idea of TSP priority [2] is that, in the buffer, RT 
packets are given transmission priority (time priority), but the 
number accepted of this kind of packets is limited. Thus, TSP 
scheme aims to provide both delay and loss differentiation. 
Authors in [16], [17] studied an extension of TSP scheme 
incorporating thresholds to control the arrival packets of NRT 
packets (Active TSP scheme), and show, via simulation (using 
OPNET), that TSP scheme achieves better QoS measures for 
both RT and NRT packets compared to FCFS (First Come 
First Serve) queuing. 

To model the TSP scheme, mathematical tools have been used 
in [18] and QoS measures have been analytically deducted, but 
some given results are false, ([5], [6], [9]) corrected this paper 
and used MMPP and BMAP processes to model the traffic 
sources. 

When the basic TSP scheme is applied to a buffer in Node B 
(in HSDPA technology) arriving RT packets will be queued in 
front of the NRT packets to receive priority transmission on 
the shared channel. A NRT packet will be only transmitted 
when no RT packets are present in the buffer, this may the RT 
QoS delay requirements would not be compromised [2] . 
In order to fulfil the QoS of the loss sensitive NRT packets, the 
number of admitted RT packets, is limited to R, to devote more 
space to the NRT flow in the buffer. 



RT packets 



N 



R 



"\ 



NRT packets 



Figure :. the B-TSP scheme applied to a buffer 

This scheme has in important drawback; as the number of 
NRT packets can not exceed a threshold R, this will result in 
RT packet drops even when capacity is available in the section 
reserved to NRT packets in the buffer that implies bad QoS 
management for RT packets, and bad management for buffer 
space. 

Hence, in this paper, we propose an algorithm to enhance the 
basic TSP scheme (Enhanced Basic TSP: EB-TSP). The 
priority function is modified for packets to overcome the 
drawback cited above, in order to improve QoS for RT packet 
by reducing the loss probability of RT packets, and to achieve 
a better management for the network resources. 
The rest of this paper is organized as follows: section 2 
introduces the proposed buffer management scheme, which is 
termed as EB-TSP vs. Basic-TSP. Subsequently, in section 3 
the mathematical model is presented and studied. The QoS 
measures related to the proposed scheme are analytically 



presented in section 4. Section 5 presents the numerical results 
and shows the effect that the proposed scheme has on the 
performance of traffic. Finally, section 6 provides the 
concluding remarks. 

II. EB-TSP Scheme Descrition 

The Basic-TSP (B-TSP) buffer management scheme for 
multimedia QoS control in HSDPA Node B, proposed by 
authors in [3] is defined to maintain inter-class prioritization 
for end-users with multiple flows. It consists on putting a 
buffer, for each user, where RT and NRT flows are queued 
according to the following scheme priority. 
The RT flow packets are queued ahead of the NRT flow 
packets of the same user, for priority scheduling/transmission 
on the shared channel (time priority). At the same time, the 
NRT flow packets get space priority in the user's buffer 
queue. B-TSP scheme queuing uses a threshold R to restrict 
the maximum number of queued RT packets (fig.l). 
In [18] authors have shown B-TSP to be an effective queuing 
mechanism for joint RT and NRT QoS compared to 
conventional priority queuing schemes. 

To overcome the drawback of B-TSP scheme cited in section 
I, we propose to use the following control mechanism: 
When an RT packet arrives at the buffer, either it is full or 
there is free space. In the first case, if the number of RT 
packets is less than R, then an NRT packet will be rejected and 
the arriving RT packet will enter in the buffer. Or else, the 
arriving RT packet will be rejected. In the second case, the 
arriving RT packet will enter in the buffer. 
The same, when an NRT packet arrives at the buffer, either it is 
full or there is free space. In the first case, if the number of RT 
packets is less than R, then the arriving NRT packet will be 
rejected. Or else, an RT packet will be rejected and the arriving 
NRT packet will enter in the buffer. In the second case, the 
arriving NRT packet will enter in the buffer. 

Remark: In the buffer, the RT packets are placed all the 
time in front of the NRT packets. 

III. Mathematical Model 

A. Arrival and Sevice Processes 

The arrival processes of RT and NRT packets are assumed 
to be poissonian with rates A RT and A NRT respectively. 
The service times of RT and NRT packets are assumed to be 
exponential with rate JU RT and JU NRT respectively. 

We also assume that the arrival processes and the service 
times are mutually independent between them. 
The state of the system at any time t can be described by the 
process X(t) = (X l (t),X 2 (t)), 

where X x (t) (respectively X 2 (t) ) is the number of RT 

(respectively of NRT) packets in the buffer at time t. 
The state space of X(t) is E=fO,...., NfxfO,...., N}. 
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B. Stability finds the buffer full and the number of RT packets is more 
Since the arrival processes are Poisson (i.e the inter- man °- 

arrivals are exponential), the service times are exponential and Then the loss probability of RT packets is given by: 

these processes are mutually independent between them, then f r 

X(t) is a Markov process. . ] ^(x 1 ( S )+x 2 ( S )=n ,x 1 ( S )>r)( s ) j ^ (s)ds 

We can prove easily that X(t) is irreducible, because all the *l-rt~ j 1 ^ N~Yt~) 

states communicate between them. l 

Moreover, E is a finite space, then X(t) is positive recurrent. V\ ( s)A 2 ( s)ds 

Consequently, X(t) is an ergodic process and the equilibrium li m A® L__ : !___ . 

probability exists. ^ °° N x (t) 

C. Equilibrium Probability 

Where: 
We denote the equilibrium probability of X(t) at the state (i,j) A , , x . t . . 1 t nn 

, r /. -m , ' * JN At) is the number of arriving RT packets in the buffer 

by {/?(*, J)}, where: l 

during the time interval [0,t] 
p(i, j) = lim P(X, (t) = i, X 2 (t) = j) A 1 , N \ 2 * 

y *->- ! 2 A 0) (respectively A 0) ) is the RT (respectively NRT) 

It is the solution of the following balance equations: arriving flow in the buffer at time s. 

(A nrt +A rt )p(0,0) = M nrt P(0> 1 ) + MrtP(X>0) 1(f) = J 

1° else 
(A RT + /I NRT )p(0,N) = A NRT p 2 (0,N-l) Since X is ergodic, we show that: 

a NRT +M)p(N,0) = A RT p(N-l,0) p^ = j^ p(i9N - i) + ^j^ p^N-i) 

For i =1, , N-l i=R *kt i=*+i 

Using the same analysis, we can show that the loss probability 
(A NRT + n RT + A RT )p(i, 0) = A RT p(i - 1, 0) + // i?r /?(/ + 1, 0) of NRT pac kets is: 

F ° rjW ' '*" P L _ NRT =±p(i,N-i) + j^Zp(i,N-i) 

(A RT +A RT + J u^)p(0J)=ju RT p(lj)+A^p(0J t=o a nrt i=o 

For i=R+l, ,N-1 

B. Average Number of Packets in the Buffer 

(/V + A mr )p(i,N -i) = \ T p(UN -i-\)+ /I RT p(i-l,N -i) The average num ber of RT packets in the buffer at the 

For/ =1, ,N-1 steady state is: 

(ju RT + A RT )p(i,N -i) = +A NRT p(UN -i-l) + A RT p(i-lN -i) N RT = lim— — 

For/ =1, ,N-2,j=l,....,N-i-l We can show that: 

(^^/^^O^O^ TV N-i 

The equilibrium probability must verify the normalization RT ^ L^^^ ' J 

i=0 j'=0 

^ *£* We show also that the average number of NRT packets in 

equation given by: ^ 2^ P& ^ = 1 ' the buffer at the steady state is: 

i=0 7=0 TV N- 7 

IV. QoS Measures 7=0 t=o 

In this section, the loss probability and the delay for each q Mean Delay 

class of traffic are analytically presented. TT . T . 1 , _ t rl „ n t t , - 

Using Little s Formula [10], we deduct that the average 

delays of RT and NRT packets respectively are given: 

A. Loss Probability yY 

With the EB-TSP scheme, an RT packet is lost either when RT ~ /L (\ — P ) 

the buffer is full and the number of RT packets is more than R RT L ~ RT 

at the time of its arrival or when an NRT packet arrives and 



67 



http://sites.google.com/site/ijcsis/ 
ISSN 1947-5500 



(IJCSIS) International Journal of Computer Science and Information Security, 

Vol 9, No. 2, February 2011 



N +N 

_ iV ff TiV NRT 
U NRT 



^NRT V^ *L-NRT ) 



V. Numerical Results 

In this section we present the numerical results of EB-TSP 
scheme. We use the Maple software to solve numerically the 
system of equations given in III-C and to evaluate the QoS 
measures. The numerical results for the EB-TSP scheme are 
compared to the same value for basic-TSP scheme. In the 
simulations, we use the following parameters: 



Total queue length 


60 


Threshold for number of RT packets 


15 


Arrival rate of NRT packets 


8 


Rate service of RT packets 


30 


Rate service of NRT packets 


25 



Table 1 : Simulation parameters 

Figure. 2 plots the loss probability for the RT packets in 
both B-TSP and EB-TSP schemes. This figure shows that the 
proposed scheme has a significant impact on the performance 
of the system relatively to the RT packet loss, this effect is 
more important when the arrival rate of RT packets is 
growing. Which leads to the better quality for audio and video 
calls received by the end user in HSDPA cell using EB-TSP 
scheme. 
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Figure2: Variation of the loss probability of RT packets 
according to arrival rate of RT packets 

As expected, Figures 3, 4 and 5 show that EB-TSP scheme 
keeps the same level of other QoS measures: dropping 
probability for NRT packets and average delays for RT and 
NRT packets, compared to basic-TSP scheme. 
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Figure 3: Variation of the average delay of RT packets 
according to arrival rate of RT packets 
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Figure 4: Variation of the average delay of NRT packets 
according to arrival rate of RT packets 




Figure 5: Variation of the loss probability of NRT packets 
according to arrival rate of RT packets 
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VI. Conclusion 

In this paper we have applied a new time space priority 
scheme (Enhanced Basic-TSP) in HSDPA where multiple 
flows exist for an end user. This scheme overcomes a 
limitation of the Basic-TSP scheme previously studied in the 
literature, and achieves a better management for buffer space. 

We devise an ergodic continuous-time Markov chain CTMC 
to characterize the transition of the system. The QoS measures 
in the proposed scheme are analytically given for both flows. 
Numerical results show that the EB-TSP have a significant 
impact on the RT packet dropping, and keep the RT delay and 
NRT packet dropping in the same level compared to Basic- 
TSP scheme. This implies an enhancement of the QoS 
relatively to the received RT flow at the end users 
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Abstract — Aligning multiple biological sequences such as in 
protein or DNA/RNA is a fundamental task in bioinformatics and 
sequence analysis. In the functional, structural and evolutionary 
studies of sequence data the role of multiple sequence alignment 
(MSA) cannot be denied. It is imperative that there is accurate 
alignment when predicting the RNA structure. MSA is a major 
bioinformatics challenge as it is NP-complete. In addition, the 
lack of a reliable scoring method makes it harder to align the 
sequences and evaluate the alignment outcomes. Scalability, 
biological accuracy, and computational complexity must be taken 
into consideration when solving MSA problem. The harmony 
search algorithm is a recent meta-heuristic method which has 
been successfully applied to a number of optimization problems. 
In this paper, an adapted harmony search algorithm (HS-MSA) 
methodology is proposed to solve MSA problem. In addition, a 
hybrid method of finding the conserved regions using the Divide- 
and-Conquer (DAC) method is proposed to reduce the search 
space. The proposed method (HS-MSA) is extended to a parallel 
approach in order to exploit the benefits of the multi-core and 
GPU system so as to reduce computational complexity and time. 

Keyword: RNA, Multiple sequence alignment, Harmony search 
algorithm. 

I. Introduction 

Living organisms are related to each other throughout 
evolution. A pair of organisms sometimes has a common 
ancestor in the past from which they were evolved. MSA tries 
to discover the similarities among the sequence and recover the 
mutations that took place. 

A sequence is an ordered list of symbols from a set of 
letters of the alphabet, S (20 amino acids for protein and 4 
nucleotides for RNA/DNA). In bioinformatics, a RNA 
sequence is written as s = AUUUCUGUAA. It is a string of 
nucleotides symbols comprising adenine (A), cytosine (C), 
guanine (G) and uracil (U): S = {A, C, G, U}. 
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Alignment is a method to arrange the sequences one over 
the other to show the match and mismatch between the 
residues. A column which has match residues shows that no 
mutation has occurred whereas a column with mismatch 
symbols indicates that several mutation events are happening. 
To improve the alignment score, the character "-" is used to 
correspond to a space introduced in the sequence. This space is 
usually called a gap. The gap is viewed as an insertion in one 
sequence and deletion in the other. A score is used to measure 
the alignment performance. The highest score of one indicates 
the best alignment. 

For clarity's sake, the generic MSA problem is expressed 
using the following declaration: "Insert gaps within a given set 
of sequences in order to maximize a similarity criterion"[l]. 
Finding an accurate MSA from the sequences is very difficult. 
It is a time consuming and computationally NP-hard 
problem[2, 3]. The MSA problem can be divided into three 
difficulties, that is, scalability, optimization, and objective 
function. 

In fact, the complexity that arises from all the three 
problems must be solved simultaneously. The first problem, 
scalability, is about finding the alignment of many long 
sequences. The second problem, optimization, deals with 
finding the alignment with the highest score based on a given 
objective function among the sequences. Optimization of even 
a simple objective function is an NP-hard problem. The third 
problem, the objective function (OF), involves speeding up the 
calculation in order to measure the alignment. 

MSA covers two closely related problems: global MSA and 
local MSA. Global MSA aligns sequences across their whole 
length while local MSA aligns certain parts of the sequences, 
and locates conserved regions along with them as shown in 
Figure 1. 
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Figure 1 . Global and local MSA 
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In bioinformatics, MSA is a major interesting problem and 
constitutes the basis for other molecular biology analyses. 
MSA has been used to address many critical problems in 
bioinformatics. Studying these alignments provides scientists 
with information needed to determine the evolutionary 
relationships between them, find the sequences of the family, 
detect the structure of protein/DNA, reveal the sequence 
homologies, predict the functions of protein/DNA sequences, 
and predict the patient's diseases or discover drug-like 
compounds that can bind to the sequences. 

In general, the primary step in the secondary structure 
prediction is through MSA, particularly in the prediction of the 
structure of RNA sequences. The RNA structure prediction 
method is extremely affected by the quality of the 
alignment[4]. Indeed, prediction of an accurate RNA secondary 
structure relies on multiple sequence alignments to provide data 
on co-varying bases[5]. MSA significantly improves the 
accuracy of protein/RNA structure prediction. For example, 
current RNA secondary structure prediction methods using 
aligned sequences have been successful in gaining a higher 
prediction accuracy than those using a single sequence [6]. 
Nucleic acid sequences are of primary concern in our proposed 
method to evaluate and improve the influence of the alignment 
tools on RNA secondary structure prediction. 

Many different approaches have been proposed to solve the 
MSA problem. Dynamic programming, progressive, iterative, 
consistency and segment-based approaches are the most 
commonly used approaches [7]. Although many MSA 
algorithms are available, a solution has yet to been found that is 
applicable to all possible alignment situations [7]. 

It is well-known fact that the MSA problem can be solved 
by using the dynamic programming (DP) algorithm[8, 9]. 
Unfortunately, such an approach is notorious for its large 
consumption of processing time. DP methods with the sum-of- 
pairs score have been shown to be a NP-complete 
problem[10],[ll]. Algorithms that provide the optimal solution 
is time consuming and have a running time that grows 
exponentially with the increase in the number of sequences and 
their lengths. 

In essence, all widely used MSA tools seek an alignment 
with a high sum-of-pairs score. This optimization problem is 
NP-complete [2, 3] and thus motivates the research into 
heuristics. Over the last decade, the evolutionary and meta- 
heuristic approaches are one of the most recent approaches that 
have been used to solve the optimization problem. 
Evolutionary and meta-heuristic algorithms have been used in 
several problem domains, including science, commerce, and 
engineering. Consequently, most of the practical MSA 
algorithms are based on heuristics to obtain a reasonably 
accurate MSA within a moderate computational time and that 
which usually produces quasi-optimal alignment. Although 
many algorithms are now available, there is still room to 
improve its computational complexity, accuracy, and 
scalability. 

In this paper, a novel algorithm (HS-MSA), that is, a meta- 
heuristic technique known as harmony search algorithm, is 



proposed to solve the old MSA problem. The MSA problem is 
viewed as an optimization problem and can be resolved by 
adapting a harmony search algorithm. Since the search space in 
HS is wide, a modified algorithm is proposed (MHS-MSA) to 
find the conserved blocks using well-known regions, and then 
align the mismatch regions between the successive blocks to 
form a final alignment. HS-MSA is extended to include the 
divide-and-conquer (DCA) approach in which DCA is used to 
cut and combine the sub-sequence to form the final MSA. 
Another proposed technique is to use the harmony search 
algorithm as an MSA improver (HSI-MSA) in which the initial 
alignment can be obtained from the conventional algorithms or 
their combinations. HS-MSA can be extended to the parallel 
algorithm (PHS-MSA) in order to exploit the benefits of the 
multi-core and GPU system to reduce computational 
complexity and time. 

This paper is organized as follows: Section 2 reviews the 
related literature and describes the state-of-the-art MSA 
approaches. Section 3 explains the proposed algorithm. The 
evaluation and analysis methodology that is used to assess our 
proposed algorithm is explained in Section 4. Lastly, Section 5 
provides the conclusion and summary of the paper. 

II. Literature Review 

There are several MSA algorithms reported in the literature 
review. For a deeper understanding about the MSA algorithms, 
the basic concepts of MSA alignment representation, gap 
penalty, alignment scores, dataset benchmarks, MSA 
approaches, and harmony search algorithm need to be 
understood. As such subsection 2.1 briefly reviews the 
representation of MSA alignment followed by the details about 
gap penalty in subsection 2.2. The alignment scores, RNA 
datasets and benchmarks, and current MSA approaches are 
explained in subsections 2.3, 2.4 and 2.5 respectively. 
Subsection 2.6 provides a summary of the MSA algorithms and 
concludes with the harmony search algorithm in subsection 2.7. 

A Representation of MSA Alignment 

There are several ways to represent a multiple sequence 
alignment. Usually, the final sequences are an aligned listing of 
the entire sequence of one over the other. However, during the 
alignment process, it is helpful to represent the alignment of the 
sequences in a manner known as a representation. Some of the 
representations that have been used in previous algorithms 
include a bit matrix as used in[12], a matrix of gaps position as 
used in[13], multiple number-strings as used 
in[14],[15],[16],[17], string representation[18],[19],[20] as used 
in SAGA[18], four parallel chromosomes as used in[21], 
directed acyclic graph (DAG) as used in[22, 23], A-Bruijn 
graph as used in[24-26] , and dispersion Graph as used in[27]. 

B. Gaps Penalty 

A negative score or a penalty can be assigned to a set of 
gaps. Two types of gaps which were mentioned in the previous 
reviews [2 8] are defined as follows: 

Linear gap model - in this model a Gap is always given 
the same penalty wherever it is placed in the alignment. 
The penalty is proportional to the length of the gap and is 
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given by gap = nxgo, where go < is the opening penalty aligned residue pairs [3 6]. It has been used in PRIME[37], 

of a gap and n is the number of consecutive gaps. and ProbCons[38] algorithms. 



Affine gap model - in this model both the new gap and 
extension gap are not given the same penalty. The 
insertion of a new gap has a greater penalty than the 
extension of an existing gap and is given by gap = go + (n 
- 1) x ge, where go < is the gap opening penalty and ge 
< is the gap extension penalty and are such that |ge| < 
|go|. 

C. Alignment Score 

The MSA objective function is defined for assessing the 
alignment quality either explicitly or implicitly. An efficient 
algorithm is used to find the optimal or a near optimal 
alignment according to the objective function. Matches, 
mismatches, substitutions, insertions, and deletions need to be 
scored in the scoring function. The scoring function can be 
divided into two parts: substitution matrices and gap penalties. 
The former provides a numerical score for matches and 
mismatches while the latter allows for numerical quantification 
of insertions and deletions. All possible transitions between the 
20 amino acids, or the 4 nucleic acids are represented in a 
substitution matrix which is an array of two dimensions of 20 x 
20 for amino acid and 4 x 4 for nucleic acids. 

Usually a simple matrix used for DNA or RNA sequences 
involves assigning a positive value for a match and a negative 
value for a mismatch[20]. Meanwhile, the scores for protein 
aligned residues are given as log-odds [29] substitution matrices 
such as PAM[30], GONNET[31], or BLOSUM[32]. 

There are several models for assessing the score of a given 
MSA. Many MSA tools have adopted the score method. A 
brief review of the score method that has been used to calculate 
the alignment score is as follows: 

Sum-of-Pairs (SP): It was introduced by Carrillo and 
Lipman[10]. More details about the sum-of-Pairs will be 
presented later. 

Weighted sum-of-pairs score [3 3], [34]: The weighted sum- 
of-pairs (WSP) score is an extension of the SP score so 
that each pair-wise alignment score contributes differently 
to the whole score. 

Maximal expected accuracy (MEA)[35]: The basic idea of 
MEA is to maximize the expected number of "correctly" 



Consistency-based Scoring: This consistency concept was 
originally introduced by Gotoh [9] and later refined by 
Vingron and Argos[39]. Consistency-based scoring is used 
in T-Coffee[40], MAFFT[41], and Align-m[42] 
algorithms. 

Probabilistic consistency Scoring function: This scoring 
function is introduced in ProbCons[38]. It is a novel 
modification of the traditional sum-of-pairs scoring 
system. This promising idea is implemented and extended 
in the PECAN[43], MUMMALS[44], PROMALS[45], 
ProbAlign[46] , ProDA[47], and PicXAA[48] programs. 

Segment-to-segment objective function: It is used by 
DIALIGN[49] to construct an alignment through 
comparison of the whole segments of the sequences rather 
than the residue-to-residue comparison. 

NorMD[50] objective function: It is a conservation-based 
score which measures the mean distance between the 
similarities of the residue pairs at each alignment column. 
NorMD is used in RASCAL[51] and AQUA[52]. 

Muscle profile scoring function: MUSCLE [5 3] uses a 
scoring function which is defined for a pair of profile 
positions. In addition to PSP, MUSCLE uses a new profile 
function which is called the log-expectation (LE) score. 

D. RNA Database and Benchmarks 

Typically, a benchmark of reference alignments is used to 
validate the MSA program. The accurate score is given by 
comparing the aligned sequence (test sequences) produced by 
the program with the corresponding reference alignment. Most 
alignment programs have been extensively investigated for 
protein. To date, few attempts have been made to benchmark 
nucleic acid sequences. 

RNA reference alignments exist in several databases. It 
must be noted that although these databases provide a 
substantial amount of information to the specialist, they do 
differ in the file formats used and the data obtained. Herein, a 
brief review of the benchmarks and database that have been 
used for multiple RNA sequence alignment is explained in 
Table 1. 



TABLE I. 



Database and Benchmarks 



RNA Database 


Description 


Website 


Rfam[54]'[55] 


It is a compilation of alignment and covariance models including many 
regular non-coding RNA families [5 5] 


http://rfam.sanger.ac.uk/ 
http://rfam.ianelia.org/index.html. 


BRAliBase[56]'[57] 


It is a compilation of RNA reference alignments especially designed for the 
benchmark of RNA alignment methods [5 7]. 


http://www.biophvs.uni- 
duesseldorf.de/bralibase/ 
http ://proi ects .binf.ku.dk/pgardner/bralibase/ 


Comparative RNA Website 
(CRW)[58] 


It has alignments for rRNA (5S / 16S / 23 S), Group I Intron, Group II 
intron, and tRNA for various organisms[58] 


http://www.rna.ccbb.utexas.edu/ 


European Ribosomal RNA 
Database[59]'[60] 


It is a collection of all complete or nearly complete SSU (small subunit) and 
LSU (large subunit) ribosomal RNA sequences available from public 
sequence databases [60]. 


http://bioinformatics.psb.ugent.be/webtools/ 

rRNA/ 


The Ribonuclease P 
Database[61] 


It contains a collection of sequence alignments, RNase P sequences, three 
dimensional models, secondary structures, and accessory informational]. 


http://www.mbio.ncsu.edu/RnaseP/ 


5S Ribosomal RNA 


It is a collection of the large subunit of most organellar ribosomes and all 


http://biobases.ibch.poznan.pl/5SData/ 
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Database [62] 


cytoplasmic. This database is intended to provide information on nucleotide 
sequences of 5S rRNAs and their genes [62] . 




tmRNA[63] 


tmRNA (also known as lOSa RNA or SsrA) contains a compilation of 
sequences, alignments, secondary structures and other information. It shows 
secondary structure, together with careful documentation[63]. 


http : //www . indiana.edu/~tmrna/ 


The tmRDB( tmRNA 

database) [64] 


tmRDB provides aligned, secondary and tertiary structure of each tmRNA 
molecule. The alignment is available in several formats. 


http://www.ag.auburn.edu/mirror/tmRDB/ 


RNAdb[65]'[66] 


It provides sequences and annotations for tens of thousands of non-coding 
RNAs. 


http://research.imb.uq.edu.aU/rnadb/default.a 
spx 


Noncoding RNA (ncRNA) 
database [67] 


It provides information of the non-coding RNA sequences and functions of 
transcripts, (the non-coding RNA does not code for proteins, but performs 
regulatory roles in the cell) 


http://biobases.ibch.poznan.pl/ncRNA/ 



E. Current MSA Approaches 

Many research on MSA algorithms have been published in 
the last thirty years and reviewed by a few researchers such 
as [7], [68], [69], [70]. The published algorithms vary in the way 
the researchers choose the specified order to do the alignment, 
and in the procedure used to align and score the sequences. 
Existing algorithms can be classified into one or combinations 
of the following basic approaches: exact, progressive, iterative 
algorithms, group alignment, block-based, consistency-based, 
probabilistic, computational intelligence, and heuristic. The 
following subsections provide a brief overview of the 
consistency-based, block-based and heuristic optimization 
approaches. These approaches are related in one way or the 
other to our proposed work. The consistency-based approach 
is explained in subsection 2.5.1 followed by the block-based 
approach in subsection 2.5.2. Finally, the heuristic 
optimization approach is explained in subsection 2.5.4. 

1) Consistency-based Approach 
The "consistency-based" approach is one of the strategies 
that has been proposed to improve the MSA scoring function. 
This approach tries to reduce the chance of early errors when 
constructing the alignment instead of correcting the existing 
errors via post processing[40],[38]. This is typically achieved 
by improving the pair-wise sequence quality based on other 
sequences in the alignment so as to obtain pair-wise alignments 
that are consistent with one another. This consistency strategy 
was originally described by Gotoh[9] and later refined by 
Vingron and Argos[39]. This strategy has been modified by 
several methods since then. 

SAGA[18] incorporated the optimization of alignment with 
COFFEE based on a consistency measure called the 
consistence-based objective function. 

Later, Dialign2[71] represented the consistency-based 
method incorporating the segment-by- segment approach. 

Similarly, Align-m[42] used a local alignment as a guide to 
a global alignment non-progressive problem. Align-m used the 
pair-wise alignment consistency to find the parts that are 
consistent with each other. 

T-Coffee[40] also implemented this idea by using a 
consistency-based alignment measure based on a library of 
pair-wise alignments. This method was later brought into a 
probabilistic framework by ProbCons[38], MUMMALS[44], 
ProbAlign[46], PROMALS[45], and MSAProbs[72]. 

Nonetheless, a combination of different strategies can be 
used. For instance, PCMA[73] (profile consistency multiple 



sequence alignment) combined two different alignment 
strategies, that is, progressive and consistency approaches. 

2) Block-based Approach 

Block-based MSA is a method in which an alignment is 
constructed by first identifying the conserved regions into what 
is called "blocks". Then, the regions between the successive 
blocks are aligned to form a final alignment[74]. Block-based 
methods can be included in the consistency or probability- 
based^] approach. A block can be referred to a sub-sequence, 
a segment, a region, or a fragment[76]. A fragment is defined 
as pairs of ungapped segments of the input sequences [77]. A 
weight score is assigned to each possible fragment to find the 
consistent fragments with high overall sum of fragment scores. 
Those fragments are integrated from a pair-wise alignment into 
a multiple alignment. 

Searching for these conserver blocks in many blocked- 
based methods is very time-consuming. Therefore, the key 
issue is how to construct the possible set of blocks 
efficiently [75]. 

Some of the previous algorithms such as those undertaken 
by Boguski et al.,[78]; Miller,[79]; Miller et al.,[80] construct 
blocks either by pair-wise alignment or by those not matched 
by all the N sequences. Instead of starting from pair-wise 
alignments, Match-Box[81] aims to identify conserved blocks 
(or boxes) among the sequences without performing a pair- 
wise alignment. Similarly, Zhao and Jiang [74] introduced the 
BMA algorithm which allows for internal gaps and some 
degree of mismatch in the method used to identify the blocks. 

Based on a combination of local and global alignment, 
Dialign[71],[82],[83] involves an extensive use of the segment- 
by-segment methods. It combines the local and global 
alignment features by identifying and adding the conserve 
regions (block) shared between the sequences based on their 
consistency weights. 

Based on the anchored alignment, CHAOS [84] used fast 
local alignments as "seeds" for a slower global-alignment. 
CHAOS is used to improve DIALIGN[71] and LAGAN[85]. 

Recently, Wang et al.[75] produced a block-based 
algorithm called BlockMSA. It combined the biclustering and 
divide-and-conquer approaches to align the sequences. 

3) Heuristic Optimization Approaches 

Many optimization problems from various fields have been 
solved by using diverse optimization algorithms. 
Computational intelligence (CI) plays an important role in 
solving the sequence alignment problem. Recently, 
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Evolutionary Algorithms have the advantage of operating on 
several solutions simultaneously, combining an exploratory 
search through the solution space with the exploitation of 
current results [15]. There are no restrictions on the sequence 
numbers or their length. It is very flexible in optimizing the 
solution with low complexity. Many efforts have attempted to 
solve the MSA problem using evolutionary programming [8 6], 
[87]. Since MSA has computational difficulty, there is no best 
method that can solve MSA professionally. 

Heuristic optimization approaches include genetic 
algorithm, ant colony, swarm intelligence, simulating 
annealing, tabu search, and combinations thereof. In the 
following subsections, the several techniques of heuristic 
optimization approaches are explained to show how these 
techniques are applied to solve the MSA problems. 

a) Genetic Algorithm 

Genetic Algorithm (GA) is a heuristic search that performs 
an adaptive search to find optimal solutions of large-scale 
optimization problems with multiple local minima[15] using 
techniques that simulate natural evolution. 

GA is well suited for solving some NP-complete problems 
such as MSA. Sequence Alignment by Genetic Algorithm 
(SAGA)[18] is the earliest GA to be used to solve MSA 
problems. With the GA approach there are different methods 
that can be applied to solve the MSA problem such as the one 
usedin[13], [12],[17],[88],[19],[20]. 

Some methods are a hybrid with other approaches. Zhang 
and Wong[89] presented a method that used pair-wise dynamic 
programming (DP) technique based on GA. Similarly, utilizing 
GA in a progressive approach has been presented in[90]. Later, 
Wang and Lefkowitz[91] produced the GenAlignRefme 
algorithm which uses a genetic algorithm to improve local 
region alignment which leads to improving the overall quality 
of global multiple alignments. In[92] GA is used as an iterative 
method to refine the alignment score obtained by the 
progressive method. The use of GA to find the cut-off point in 
the divide-and-conquer approach is presented in[93]. Using 
similar combinations, a novel algorithm of genetic algorithm 
with ant colony optimization GA-ACO was presented by Lee et 
al.[94]. Chen et al.[95] reported a method which employs a 
new selection scheme to avoid premature convergence in GAs. 
Taheri and Zomaya[96] presented RBT-GA using a 
combination of the Rubber Band Technique (RBT) and the 
Genetic Algorithm (GA). Jeevitesh et al.[97] proposed the 
PASA algorithm which used the alignment outputs of two 
MSA programs - MCoffee and ProbCons - and combined 
them in a genetic algorithm model. 

b) ANT Colony 

Ant colony optimization algorithm (ACO) is a probabilistic 
technique for solving computational problems. It is one of the 
swarm intelligence families. The ACO algorithm is used as a 
new cooperative search algorithm in solving optimization 
problems. ACO was inspired from the observation of the 
activities of real ants [9 8], [99], [100]. Recently, ACO is used to 
solve the NP-complete problems. 
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It shows efficiency in solving the MSA problems such as 
those reported in[101],[102] where each proposed algorithm 
was based on the ant colony optimization and divide-and- 
conquer technique. Other researchers such 
as[103],[104],[27],[105] relied on the ant colony to solve the 
MSA problem in their research work. 



c) Particle Swarm Optimization 

Particle swarm optimization (PSO) is a swarm intelligence 
technique for numerical optimization. It simulates the 
behaviour of bird flocking or fish schooling. PSO was 
presented by Kennedy and Eberhart[106] in 1995. The 
simplicity of implementation, quick convergence, and few 
parameters have resulted in PSO gaining popularity. 

Many researchers have made modifications to the PSO idea 
and utilized this technique widely in solving MSA problems. 
Rasmussen and Krink[107] used a combination of particle 
swarm optimization and evolutionary algorithms to train 
HMMs for protein sequences alignment. Meanwhile, Pedro et 
al.[108] presented an algorithm based on PSO to improve a 
sequence alignment previously obtained using ClustalX. Juang 
and Su[109] produced an algorithm which combined the pair- 
wise DP and particle swarm optimization (PSO) to overcome 
the local optimum problems. Xu and Chen[110] designed an 
improved particle swarm optimization to solve MSA. Based on 
the idea of chaos optimization Lei et al.[l 11] produced chaotic 
PSO (CPSO) to solve MSA. A novel algorithm of mutation- 
based binary particle swarm optimization (M-BPSO) was 
presented by Hai-Xia et al.[l 12] for solving MSA. 

d) Simulated Annealing 

Simulated annealing (SA) was described by 
Kirkpatrick[113]. Simulated annealing is an algorithm that 
attempts to simulate the physical process of annealing. The 
basic concept of simulated annealing algorithms is based on 
observing the change of energy in which materials solidify 
from the liquid state to the solid state [1 14]. 

Several SA algorithms have been used to solve MSA 
problem. Kim et al.[115] used simulated annealing to develop 
the MS AS A algorithm for solving MSA. Uren et al,[116] 
presented MAUSA that used simulated annealing to perform a 
search through the space of possible guide trees. Meanwhile, 
Keith et al.[l 17] described a new algorithm for finding a 
consensus sequence by using the SA method. Omar et al.[l 18] 
produced a combination of Genetic Algorithm and Simulated 
Annealing to solve MSA problems. Roc[114] presented a 
method for multiple DNA sequence alignment in which an 
optimal cut-off point is chosen by the genetic simulated 
annealing (GSA) techniques. Joo et al.[l 19] presented a new 
method called MSACSA for MSA, which is based on the 
conformational space annealing (CSA). CSA combines three 
traditional global optimization methods, that is, SA, genetic 
algorithm (GA), and Monte Carlo with minimization (MCM). 

e) Tabu Search 

Tabu search is a meta-heuristic approach used to solve 
combinatorial optimization problems. Tabu search (TS) and 
simulated annealing are similar in that both traverse the 
solution space by testing mutations of an individual solution. 
However, they differ in the number of generated solutions. 
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While simulated annealing generates only one mutated 
solution, tabu search generates many mutated solutions and 
moves to the solution with the lowest energy of those 
generated. TS has been used to solve MSA problems. Riaz at 
el. [120] has implemented the adaptive memory features of tabu 
search to refine MSA. Lightner[121] used a tabu search 
approach to obtain multiple sequence alignment and explored 
iterative refinement techniques such as the hidden Markov 
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model and the intensification heuristic approach to further 
improve the alignment. 



F. Summary of Related Algorithms for MSA 

Table 2 lists the most current algorithms that are in use. 
This list is incomplete but includes the most related algorithms 
explained above. Online availability is the link to the online 
server or the site which can download and access the particular 
algorithm. 



TABLE II. 



Current MSA Algorithms 



Algorithm 


Approach 


RNA 


Online Availability 


Reference 


MAFFT 


Consistency 


Y 


http : //mafft . cbrc . i p/ali gnment/server/ 


[122] 


MUSCLE 


Progressive/ refinement 


Y 


http://www.ebi.ac.uk/Tools/msa/muscle/ 


[123] 


Dialign2 


Consistency/ segment 


Y 


http://bibiserv.techfak.uni-bielefeld.de/cgi-bin/dialign_submit 


[71] 


Align-m 


Consistency 


N 


http://bioinformatics.vub.ac.be/software/software.html 


[42] 


BlockMSA 


3 -way consistency/ 
Block/DCA 


Y 


http://aug.csres.utexas.edu/msa/ 


[75] 


MAUSA 


SA 


N 


http://eprints.utas.edu.au/208/ 


[116] 


SAGA 


Iterative/Stochastic/GA 


Y 


http://www.tcoffee.org/Projects_home_page/saga_home_page.html 


[18] 


Mishima 


k-tuple 


Y 


http://esper.lab.nig.ac.jp/study/mishima/ 


[124] 


MSAProbs 


Pair-HMM and partition function 


Y 


http://sourceforge.net/proiects/msaprobs/ 


[72] 


pecan 


Consistency/ progressive 


- 


http://www.ebi.ac.uk/~bjp/pecan/ 


[43] 


PicXAA 


posterior probability/ consistency 


Y 


http://www.ece.tamu.edu/~bjyoon/picxaa/ 


[48] 


PRIME 


GROUP-TO-GROUP/ ANCHOR 


Y 


http : //prime . cbrc .j p/ 


[37] 


ProAlign 


HMM/ progressive 


Y 


http://applications.lanevol.org/ProAlign/ 


[125] 


PROBCONS 


posterior probability 
pair-hmm 


N 


http ://probcons . stanford.edu/index.html 


[38] 


ProDA 


repeated and shuffled elements 


Y 


http ://proda. stanford.edu/ 


[47] 


Probalign 


posterior probabilities 


Y 


http://probalign.niit.edu/probalign/login 


[46] 


REFINER 


Refinement/ Block 


- 


ftp://ftp.ncbi.nih.gov/pub/REFINER 


[126]' 
[127] 


AIMSA 


Region 


- 


- 


[128] 


PRALINE 


Profile/iterative 
/progressive 


- 


http : //www . ibi . vu . nl/pro grams/praline www/ 


[129] 


T-COFFEE 


Consistency/ Progressive 


Y 


http://www.tcoffee.org/ 


[40] 


MUMMALS 


Probability HMM 


N 


http://prodata.swmed.edu/mummals/mummals.php 


[44] 


PROMALS 


k-mer/ Pair-HMM consistency 


Y 


http://prodata.swmed.edu/promals/promals.php 


[45] 


PCMA 


/c-mer/ Profile/consistency 


- 


ftp://iole.swmed.edu/pub/PCMA/pcma/ 


[73] 


BMA 


Conserve block 


Y 


- 


[74] 


GA-ACO 


GA and Ant colony 


- 


- 


[94] 


PASA 


Refine by GA 


- 


- 


[97] 



G. Harmony Search Algorithm 

Harmony search algorithm (HS) is developed by 
Geem[130]. HS is a meta-heuristic optimization algorithm 
based on music. 

HS simulates a team of musicians together trying to seek 
the best state of harmony. Each player generates a sound based 



on one of the three options (memory consideration, pitch 
adjustment, and random selection). This is the equivalent of 
finding the optimal solution in an optimization process. 

Geem et al.[130] models HS components into three 
quantitative optimization processes as follows: 
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The Harmony memory (HM): It is used to keep good 
harmonies. A harmony from HM is selected randomly 
based on the parameter called harmony memory 
considering (or accepting) rate, HMCR € [0,1]. It typically 
uses HMCR = 0.7 -0.95. 

The pitch adjustment: It is similar to a local search. It is 
used to generate a slightly different solution from the HM 
depending on the pitch- adjusting rate (PAR) values. PAR 
controls the degree of the adjustment by the pitch 
bandwidth (brange). It usually uses PAR = 0.1-0.5 in most 
applications. 

The random selection: A new harmony is generated 
randomly to increase the diversity of the solutions. The 
probability of randomization is Prandom =1- HMCR , and 
the actual probability of the pitch adjustment is Ppitch = 
HMCR x PAR. 

The pseudo code of the basic HS algorithm with these three 
components is summarized in Figure 2. 

Harmony Search Algorithm 

Begin 

Declare the objective function f(x), x =(xi,x 2 , ...,x n ) 
Initialize the harmony memory accepting rate (HMCR) 
Initialize pitch adjusting rate (PAR) and other parameters 
Initialize Harmony Memory with random harmonies 
While (t<max number of iterations ) 
If(rand<HMCR), 
Choose a value from HM 

If (rand<PAR), Adjust the value by adding certain amount 
End if 
Else choose a new random value 
End if 
End while 

Calculate the objective function 
Accept the new harmony (solution) if better 
Update HM 
End while 

Find the current best solution in HM 
End 

Figure 2 . Pseudo Code of the Harmony Search Algorithm[ 131] 

Later, Geem[132] proposed an ensemble harmony search 
(EHS) where a new ensemble consideration operation is added 
to the original HS structure. The new operation takes into 
account the relationship among the decision variables, and the 
value of each decision variable can be chosen based on the 
other variables. 

Thereafter, Mahdavi et al.[133] produced an improved 
harmony search (IHS), in which the parameter PAR and pitch 
bandwidth are adjusted dynamically in the improvisation step. 

So far, Omran and Mahdavi[134] have proposed a global- 
best harmony search (GHS) in which the performance of HS is 
improved by borrowing the concepts from swarm intelligence 
to modify the pitch- adjustment step such that the new harmony 
is assigned by the best harmony in the HM. 

Meanwhile, Pan at el. [135] produced a local-best harmony 
search algorithm with dynamic subpopulations (DLHS) for 
solving continuous optimization problems. The DLHS 
algorithm differs from the existing HS in that a whole harmony 
memory (HM) is divided into many sub-HMs and the 
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independent processes are performed in each sub-HM. A 
periodic regrouping schedule is used to exchange information 
between the sub-HMs, so that the population diversity and the 
improvement in the accuracy of the final solution are 
maintained. In addition, the parameters are adjusted using a 
new developed adaptive strategy to enable it to be used with a 
particular problem or phase of the search process. 



Recently, Zou at el. [136] proposed a novel algorithm 
known as a global harmony search algorithm (NGHS) to solve 
reliability problems. 

NGHS modifies the improvisation step of the HS. Position 
updating and genetic mutation are new operations included in 
NGHS. Position updating enables the worst harmony of HM to 
move toward the global best harmony rapidly while genetic 
mutation prevents NGHS from becoming trapped into the local 
optimum. 

III. The Proposed Algorithm 

Herein, in this article several algorithms are proposed to 
solve the MSA problem by using the adapted harmony search 
algorithm (HS). Adaptive HS for MSA is explained in the next 
subsection 3.1. A modified HS algorithm for reducing search 
space is explained in subsection 3.2. Subsection 3.3 describes 
the HS Improver. Finally, in subsection 3.4 a parallel HS-MSA 
is introduced which can be implemented in different parallel 
platforms such as the Multi-core and GPU. Figure 3 shows the 
stages of the proposed research framework. 
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A. Proposed Harmony Search Algorithm for MSA 

The main goal of the MSA algorithms is to detect and align 
the homologous regions across the different sequences. This is 
achieved by optimizing an objective function that measures the 
quality of the alignment. The harmony search is a new meta- 
heuristic optimization algorithm which has a history in solving 
NP-complete problems [137]. This subsection explains the 
ability of the harmony search algorithm in solving MSA 
problem. Herein alignment representation, objective function, 
harmony memory initialization, and adaptive harmony search 
algorithm for MSA are explained in greater details. 

1) Alignment Representation 

Alignment of N sequences with different lengths from L^ to 
L N , are represented as a matrix N x W where each row contains 
gap positions encoded for each sequence. The length of the 
rows in the matrix is W = [aLmax], where Lmax = max 
{Li,L 2 ,..,L N }, and [x] is the smallest integer greater than or 
equal to x, and the parameter a is a scaling factor[86]. The 
value a is chosen according to the probability distribution. The 
value of a can be 1.2 as used in[94] or 1.5 as used 
in[138],[13],[20]. The choice of 1.2 is to allow the aligned 
sequences to be 20% longer than the longest sequence. 
Meanwhile the selection of 1.5 is to allow the alignment to be 
50% longer than the longest sequence in the test as in [138]. 

2) Objective Function 
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To find the optimal solution in the HS-MSA, the sum-of- 
pairs (SP) score described in[139],[140],[10],[107] will be used 
to calculate the Objective Function (OF) where there is no prior 
knowledge of the reference alignment. The general form of the 
OF score of alignment n sequences which consists of M 
columns is: 



OF = ZU {SnCmO-GnCmO}, 

where S n (mi) is the similarity score of the column mi, 
G n (mi) is the gap penalty of the column mi and 1 is the 
sequence length. The similarity score of the column mi can be 
measured by the sum-of-pairs (SP). The SP-score S(mi) for the 

i-th column mi is calculated as follows: 

S(mi) = Ef=- 1 1 i:U= j+ is(m ) i ,mb, 

where m| is the j-th row in the i-th column. For aligning 
two residues x and y, the substitution matrix s(x,y) is used to 
give the similarity score. 

3) Harmony Memory Initialization 
For a given 5 sequences, the procedure to initialize the 
harmony memory is as follows: Maximum sequence length is 
MaxS = 7, minimum sequence length is MinS = 4, maximum 
length of alignment is W = [1.2 * 7] = 9, maximum gaps in 
sequence Si is (W - Li) where Li is the length of sequence i, 
maximum number of gaps is Gs = 9 - 4 = 5. 
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B. Aligned sequence 
Figure 4. Harmony memory initialization 



The initial harmony memory is randomly generated and the 
rows are initialized in the following way: First, a random 
permutation number W-Li of gap positions is generated from a 
range of values (1 - W) for each sequence Si with length Li. 
Second, those numbers (W-Li) are sorted and used to indicate 
where the corresponding gaps are placed in the matrix. Finally, 
the positions in the matrix rows which are not associated by 
gaps are filled with the base symbols taken from the original 
sequence. 

The random initialization procedure that produces the initial 
Harmony memory is illustrated in Figure 4. This is similar to 
the procedure used in [94]. The difference in our procedure is 
that the gap positions are generated and not the residue 



positions as in[94]. The generation gap positions are less than 
the generation residue positions for each sequence. The second 
difference is related to the first step in that the number of 
permutations are (W-Li) and not W as in[94]. 

4) Adaptive Harmony Search Algorithm for MSA (AHS- 
MSA) 

The purpose of AHS-MSA is to aid scientists in producing 
a high quality of MS As that may lead to a better RNA structure 
prediction (Figure 5) as well as other issues in molecular 
biology. To date in reviewing the approaches to solving the 
MSA problem or in predicting the multiple RNA secondary 
structure, we have found that no studies have incorporated the 
use of the harmony search algorithm. The only research that 
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has involved HS in bioinformatics is that of Mohsen et al.[141] sequence based on Minimum Free Energy, 
which predicted the secondary structure for a single RNA 



RNA Seauences 
L-_-_-_-_-_-_-_-_-_ 7 ;X 

AAAACAAAAACGGAACA i ■ \ 



AAAAC AAAAAC GGAAC A 
AGGACACAAGAACGGAA 
AAAAC AAAAAC GGAAC A 



MSA 
Algorithm 



Aligned RNA Sequences 

A - -AAAC AAAAAC GGAAC A 
AGGACACAAGAACGGA - -A 
A - -AAAC AAAAAC GGAAC A 




RNA 

2D Struct. 
Prediction 
Algorithm 



ft 



Figure 5. The impact of MSA in RNA secondary structure prediction 



The HS algorithm has been successfully applied to several 
optimization problems [142]. As such this study aims to 
investigate the use and adaption of the HS algorithm in finding 
solutions to the MSA problems. The MSA problem can be 
considered as an optimization problem with minimal disruption 
of the accuracy, complexity, and speed rules. MSA can be 
resolved by adapting the harmony search algorithm. Moreover, 
HS possesses several advantages over conventional 
optimization techniques [143] such as: 

1. HS does not require initial value settings for decision 
variables; 

2. HS is a population-based meta-heuristic algorithm, which 
means that a group of multiple harmonies can be used 
simultaneously. Proper parallelism usually leads to better 
performance with higher efficiency and speed; 

3. HS uses stochastic random searches which explore the 
search space more widely and efficiently; 

4. HS does not need derivation information; 

5. HS is less sensitive to chosen parameters; 

6. HS can solve various NP-complete problems[137]; 

7. The structure of the HS algorithm is relatively easier; 

8. HS is a very successful meta-heuristic algorithm due to its 
way of handling intensification and diversification. 

9. HS is very versatile being able to combine with other 
meta-heuristic algorithms [134] 

These characteristics increase the reliability and flexibility 
of the HS algorithm in producing better solutions. 

The AHS-MSA algorithm as described in Figure 6 
combines and adapts the HS idea to solve the MSA problem. 
The steps of the AMS-MSA algorithm are as follows: 

1 . Initialize the harmony parameters (HMCR, PAR, NI, and 
HMS). 

2. Initialize the harmony memory with random harmonies by 
HMS solution. Each solution is an alignment. 

3. Calculate the objective function (OF) for each harmony. 

4. Improvise the new harmony. 

5. Accept/reject the new harmony 



6. Update the harmony memory. 



Initialize 
Parameters 



HMof 

alignment 

(HM) 




Objective 
Function 




Figure 6. The flowchart of the proposed HS-MSA algorithm 

B. A Modified Harmony Search Algorithm for MSA (MHS- 
MSA) 

To reduce the search space, a combination of methods is 
proposed. A hybrid method of HS and a segment-based 
approach is proposed and explained in the next subsection 
3.2.1. In subsection 3.2.2, a hybrid method of HS and a 
combination of segment-based and divide-and-conquer 
approaches are proposed and explained. 

3.2.1 A Harmony Search algorithm with a Segment-based 
Approach 

Lately identifying areas of local conservations before 
finding the global alignment is gaining popularity among 
researchers. Conserved regions can be a helpful guide in 
identifying the homology of sequences and assisting the 
process of MSA. This idea is not new and has been 
implemented in other algorithms such as DIALIGN[49], 
MLAGAN[85], CHAOS[84], align-m[42], and MAFFT[144] 
where blocks are first detected from the pair-wise sequence 
alignment and that information is then used to detect MSA. The 
other algorithm, such as MISHIMA[124], also used this idea in 
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which k-tuple is explored and analyzed from the original 
sequence. In the same way, well-aligned regions were seen in 
RASCAL[51],[128] where a consistency-based objective 
function called NorMD[50] was used. 

Herein, this proposed method in our research is to reduce 
the search space in the previous AHS-MSA algorithm by 
combining pair-wise alignments into multiple alignments. It 
works by finding the conserved blocks through all the 
sequences before starting the MSA process. It explores all 
possible regions, which is more correct and consistent. All 
matched blocks are used to guide the MSA alignment. The idea 
is first to detect the conserved blocks in the sequences pair- 
wise and then to apply HS to identify MSA from those 
conserved columns. 

The multiple alignment search space can be narrowed down 
to a number of possible regions per sequence pair. If parts of 
these residue pair are consistent within each other, they are 
considered as acceptable. For consistency it means that if 
symbol Ai (residue i of sequence A) is aligned correctly with 
symbol Bj , and Bj with C k , then Ai and C k should also be 
aligned. Therefore, this properly can be used to define the 
consistent parts among all the pair-wise alignments which can 
be considered as acceptable, and the gap positions can be 
defined at the rest of the aligned residue pairs. 

The ability to determine the well-aligned regions has at 
least two advantages. It prevents the same region from being 
changed in the later process. Additionally, it speeds up the 
optimization process. The modified steps of the HS-MSA 
algorithm can be summarized as follows: 

1 . Find all possible residue pairs in each sequence pair using 
the pair-wise algorithm. 

2. By using the consistency concept, find all possible blocks 
or columns that are acceptable. 

3. Calculate the score value for each block by using the sum- 
of-pairs objective function. 

4. Identify and analyze the potentially useful blocks, and 
select those that are more consistent with each other. 

5. Apply the HS algorithm to initialize the final alignment 
from these blocks and find the optimal alignment. 

3.2.2 A Harmony Search algorithm with Segment-based and 
Divide-and-conquer Approaches 
The previous proposed method can be extended where the 
divide-and-conquer (DAC)[145] method can be combined. 

Sammeth at el. [146], and Kryukov and Saitou[124] used 
the DC A approach in solving MSA. Kryukov and Saitou[124] 
produced the adapted DCA in which k-tuple is used to find the 
segments and align these segments by CLUSTALW and 
MAFFT. Sammeth at el. [146], on the other hand, integrated the 
global divide-and-conquer approach with the local segment- 
based approach as in DIALIGN. 

A set of consistent columns can form segments in the 
alignment. The DCA protocol is to cut the sequences at a point 
and repeat that cutting procedure until it is no longer exceeded. 
Then the obtained sub-sequences are aligned independently and 
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the results are combined to form a complete MSA alignment. 
The method proceeds as follows: 



1 . Find all possible residue pairs in each sequence pair using 
the pair-wise algorithm. 

2. By using the consistency concept, find all the possible 
blocks or columns that are acceptable. 

3. Calculate the score value for each column by using the 
sum-of-pairs objective function. 

4. Identify and analyze the potentially useful columns, and 
select those that are more consistent with each other. 

5. Add these conserve blocks/fragments to the fragments set 
F and they can be considered as cutting points. 

6. Divide the sequence into sub- sequence based on these 
cutting points. 

7. Apply the HS algorithm to construct the final alignment 
from these regions and find the optimal one. 

C. A Harmony Search Algorithm Improver for MSA (HSI- 
MSA) 

Another proposed method in our research work is the use of 
HSI-MSA to combine many multiple alignments into one 
improved alignment. Any conventional MSA program or a 
combination of them can initialize the Harmony memory. Then 
the Harmony algorithm can be applied as an iterative method to 
refine/combine the alignment to find the best alignment result. 
Here HS takes on the role of an improver of the accuracy of the 
current alignment. The goal of this study is to investigate 
whether this approach is going to improve the accuracy of the 
different alignments or not. This improver idea is similar to the 
PASA algorithm[97] which was used a genetic algorithm 
model to combine the alignment outputs of two MSA programs 
- M-Coffee and ProbCons. It has also been used in 
ComAlign[147], M-Coffee[148] and AQUA[52] . The 
proposed method can be summarized as follows: 

1 . Initialize the harmony memory by using well-known MSA 
algorithms including our alignment gained from the 
previous step. 

2. Calculate the score for each alignment. 

3. Apply the HS algorithm to improve and find the optimal 
alignment. 

This will combine all the alignment parts from the different 
alignments to find the optimal alignment within them and not 
just to select the best of them. 

D. A Parallel Harmony Search Algorithm for MSA (PHS- 
MSA) 

In addition to the foregoing proposed methods, another way 
to reduce the computational complexity and time consumed is 
to parallel the HS-MSA algorithm using multi-core and multi- 
GPU platforms. 

CUDA (Compute Unified Device Architecture) is an 
extension from C/C++ developed by NVIDIA to run 
thousands of threads parallelly[149] and to execute on the 
GPUs[150]. GPUs' architectures are "manycore" with 
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hundreds of cores [149]. 
streaming processor. 



GPUs were implemented as a 
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5S.B.actinobacteria), 16S (16S.B.fibrobacteres, 

16S.E.entamoebidae, 16S.E.perkinsea) ribosomal RNA. 



It is a good alternative for high performance computing and 
it will become even more excellent in the near future. 
Furthermore, availability, low price, and easy installation are 
the main advantages[151] of the GPUs compared to other 
architecture. 

Re-developing the algorithm and the data structure based 
on computer graphic concepts is the main obstacle facing the 
use of the GPUs[151],[152]. Moreover, other limitations are 
based on the streaming architecture which have to be taken into 
consideration (i.e. memory random access, cross fragment, 
persistent state) 

Many researchers have shown the design and 
implementation of bioinformatics algorithms using GPUs. 
Examples that use GPU to parallel sequence alignment 
algorithm in bioinformatics are[153], [154], [151], [155], [156], 
[157]. 

Our approach is motivated by the rapidly increasing power 
of GPU. Our proposed approach is to implement the proposed 
HS-MSA algorithm using NVIDIA's GPUs, to explore and 
develop high performance solutions for multiple sequence 
alignment. To program the GPU, the HS-MSA will be 
implemented in NVIDIA GeForce 9400 GT CUDA. The 
computation will be conducted on NVIDIA GPUs installed in a 
2.66 GHz intel Core 2 Quad CPU computer equipped with 3 
GB RAM, running on Microsoft Windows XP Professional. 

Moreover, to utilitize multiple CPU threads to incorporate 
GPU devices into one single program, the proposed method 
can be extended to use a hybrid multi-core and GPU codes by 
CUDA and OpenMP. This can lead to quicker implementation 
and greater efficiency on both GPU and multi-core CPU[158]. 

IV. Evaluation and Analysis 

To evaluate and analyse the performance of the proposed 
HS-MSA algorithm in greater depth there is a need for an 
objective criterion to assess the quality of the aligned 
sequences. The quality attained can be evaluated by comparing 
the results of the test alignment with the reference 
alignment^ 39]. 

The comparison can use some scores that may be dependent 
on the alignment itself (e.g, Sum-of-Pairs, Total Column Score) 
or independent from it (structure sensitivity and selectivity). 
This subsection describes in detail the benchmark dataset, the 
reference comparison, the alignment comparison and the 
structure comparison, which can be investigated to evaluate the 
test alignments. 

A Benchmark Dataset 

The proposed algorithm will be tested using the following 
datasets: Rfam, BRAliBase 2.1, Comparative RNA website 
(CRW), the Ribonuclease P database, 5S Ribosomal RNA 
database, tmRNA , tRNA , SRPDB, RNAdb, and ncRNA as 
explained in section 2.6. Different RNA datasets will be used 
from a variety of families and lengths such as 5S 
(5 S .B .alphaproteobacteria, 5 S .B .betaproteobacteria, 



B. Reference Comparison 

To assess the quality of the aligned sequence, it requires a 
reference alignment from the database benchmark. The 
comparison is between the test alignment and the reference 
alignment. 

Sum-of-pairs (SPS) and column Score (CS) are two 
different score functions that can be used to estimate this 
comparison. The SPS score is the percentage of the correct 
aligned residue pairs in the test alignment that occurred in the 
reference alignment^ 59]. The CS score is the percentage of the 
entire columns in the test alignment that occurred completely in 
the reference alignment[ 159]. 

In a given test alignment consisting of M columns, the ith 
column is denoted by Aii,A i2 , . . . ,A iN where N is the number 
of sequences. For each pair of residues Ay and A ik , pi(j,k) is 
defined such that Pi(j ? k) = 1 if residues A y and A^ from the test 
alignment are aligned with each other in the reference 



alignment, otherwise pi(j,k) = 
can be calculated as follows: 



0. The Score of the i th column 



si= ijiiSU^jPiak). 



Then, the sum-of-pairs score for a given test alignment can 
be calculated as follows: 

yM c. 

Sum-of-Pairs (SPS) = ngpS 

I i=1 s ri 

where Mr is the number of columns in the reference 
alignment and Sri is the score Si for the ith column in the 
reference alignment. 

Column score (CS): Using the same symbols as shown 
above, the score Ci of the ith column is equal to 1 if all the 
residues in that column are aligned in the reference alignment, 
otherwise it is equal to 0. Therefore, the column score is: 



CS= zf 



M Q 
1 i\/r 



To compare the test alignment with the corresponding 
reference alignment, the sum-of-pairs function and column 
score are used as described in[139],[107],[160],[161],[162]. 

C. Alignment Comparison 

This comparison is to evaluate the performance of the 
proposed algorithm with respect to the other MSA aligners. 
Typically, the MSA aligners are validated by using a 
benchmark data set of reference alignments. 

The Sum-of-pairs (SPS) and column scores (CS) of every 
produced alignment of each aligner program including our 
proposed algorithm are used to compare with the reference 
alignment. 

The proposed algorithm HS-MSA can be compared to the 
commonly used MSA programs on the above reference 
alignment benchmark. 
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D. Structure Comparison 

It might be expected that a more accurate alignment would 
lead to a more accurate RNA secondary structure. The 
proposed method is to investigate the impact of alignment 
accuracy on the accuracy of the RNA secondary structure using 
standard benchmarks and comparing them with the common 
well-known MSA algorithms. 

Both the alignment process and the prediction process can 
affect the accuracy of the secondary structure prediction, but 
here only the alignment process is investigated. 

The evaluation is performed in respect to sensitivity, 
selectivity or positive predictive value (PPV), and Mathews 
correlation coefficient (MCC) of the RNA secondary structure 
as used by Gardner and Giegerich[163]. The secondary 
structure of the test alignment produced by the proposed 
algorithm will be compared with that of others. The sensitivity 
and selectivity of the alignment process will be studied to 
investigate the effect of the proposed aligner on the accuracy of 
the structure as shown in Figure 7. 
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paper proposes a novel meta-heuristic method to solve the 
MSA problem. A meta-heuristic algorithm (HS-MSA), which 
has not been used up to now, is proposed for multiple sequence 
alignment that promises to greatly speed up the alignment 
process and improve its accuracy. The optimization method 
introduced herein is inspired by the so-called harmony search 
algorithm (HS). A new optimization algorithm for the 
combination of HS-MSA with segment-based multiple- 
alignment problem is also proposed and extended to include the 
parallel techniques. 
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Figure 7. Structure comparison 



V. Conclusion 



Multiple sequence alignment is a fundamental technique in 
many bioinformatics applications. Many algorithms have been 
developed to achieve optimal alignment. Some programs are 
exhaustive in nature; some are heuristic. Because exhaustive 
programs are not feasible in most cases, heuristic programs are 
commonly used. These include progressive, iterative, and 
block-based approaches. 

This paper describes briefly the basic concepts of MSA and 
reviews the common approaches in MSA. To this end, this 
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Abstract — Aligning multiple biological sequences such as in 
protein or DNA/RNA is a fundamental task in bioinformatics and 
sequence analysis. In the functional, structural and evolutionary 
studies of sequence data the role of multiple sequence alignment 
(MSA) cannot be denied. It is imperative that there is accurate 
alignment when predicting the RNA structure. MSA is a major 
bioinformatics challenge as it is NP-complete. In addition, the 
lack of a reliable scoring method makes it harder to align the 
sequences and evaluate the alignment outcomes. Scalability, 
biological accuracy, and computational complexity must be taken 
into consideration when solving MSA problem. The harmony 
search algorithm is a recent meta-heuristic method which has 
been successfully applied to a number of optimization problems. 
In this paper, an adapted harmony search algorithm (HS-MSA) 
methodology is proposed to solve MSA problem. In addition, a 
hybrid method of finding the conserved regions using the Divide- 
and-Conquer (DAC) method is proposed to reduce the search 
space. The proposed method (HS-MSA) is extended to a parallel 
approach in order to exploit the benefits of the multi-core and 
GPU system so as to reduce computational complexity and time. 

Keyword: RNA, Multiple sequence alignment, Harmony search 
algorithm. 

I. Introduction 

Living organisms are related to each other throughout 
evolution. A pair of organisms sometimes has a common 
ancestor in the past from which they were evolved. MSA tries 
to discover the similarities among the sequence and recover the 
mutations that took place. 

A sequence is an ordered list of symbols from a set of 
letters of the alphabet, S (20 amino acids for protein and 4 
nucleotides for RNA/DNA). In bioinformatics, a RNA 
sequence is written as s = AUUUCUGUAA. It is a string of 
nucleotides symbols comprising adenine (A), cytosine (C), 
guanine (G) and uracil (U): S = {A, C, G, U}. 
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Alignment is a method to arrange the sequences one over 
the other to show the match and mismatch between the 
residues. A column which has match residues shows that no 
mutation has occurred whereas a column with mismatch 
symbols indicates that several mutation events are happening. 
To improve the alignment score, the character "-" is used to 
correspond to a space introduced in the sequence. This space is 
usually called a gap. The gap is viewed as an insertion in one 
sequence and deletion in the other. A score is used to measure 
the alignment performance. The highest score of one indicates 
the best alignment. 

For clarity's sake, the generic MSA problem is expressed 
using the following declaration: "Insert gaps within a given set 
of sequences in order to maximize a similarity criterion"[l]. 
Finding an accurate MSA from the sequences is very difficult. 
It is a time consuming and computationally NP-hard 
problem[2, 3]. The MSA problem can be divided into three 
difficulties, that is, scalability, optimization, and objective 
function. 

In fact, the complexity that arises from all the three 
problems must be solved simultaneously. The first problem, 
scalability, is about finding the alignment of many long 
sequences. The second problem, optimization, deals with 
finding the alignment with the highest score based on a given 
objective function among the sequences. Optimization of even 
a simple objective function is an NP-hard problem. The third 
problem, the objective function (OF), involves speeding up the 
calculation in order to measure the alignment. 

MSA covers two closely related problems: global MSA and 
local MSA. Global MSA aligns sequences across their whole 
length while local MSA aligns certain parts of the sequences, 
and locates conserved regions along with them as shown in 
Figure 1. 



qqij&a u cuqqa q aquqqc q c q£ aa c: c 



( itic:iitn: 


i'^aat 


/UTiquqqc 


£^:f_] 


\<^i <-] u*J.e: 


\ 


\ 


\ 


\ 


\ 


"i 


^^C-JLUlI 


il- LICV 


CL,::jv: 


_. .:\lCOC\ 


_ . yc^ 


~l—t A- 

Cj Cj LI LfcLl LL C"JLL Cj Cj <Jp~-l '^ C 




/ 


J~ 


ojgon 


lljU-OO 



Figure 1 . Global and local MSA 
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In bioinformatics, MSA is a major interesting problem and 
constitutes the basis for other molecular biology analyses. 
MSA has been used to address many critical problems in 
bioinformatics. Studying these alignments provides scientists 
with information needed to determine the evolutionary 
relationships between them, find the sequences of the family, 
detect the structure of protein/DNA, reveal the sequence 
homologies, predict the functions of protein/DNA sequences, 
and predict the patient's diseases or discover drug-like 
compounds that can bind to the sequences. 

In general, the primary step in the secondary structure 
prediction is through MSA, particularly in the prediction of the 
structure of RNA sequences. The RNA structure prediction 
method is extremely affected by the quality of the 
alignment[4]. Indeed, prediction of an accurate RNA secondary 
structure relies on multiple sequence alignments to provide data 
on co-varying bases[5]. MSA significantly improves the 
accuracy of protein/RNA structure prediction. For example, 
current RNA secondary structure prediction methods using 
aligned sequences have been successful in gaining a higher 
prediction accuracy than those using a single sequence [6]. 
Nucleic acid sequences are of primary concern in our proposed 
method to evaluate and improve the influence of the alignment 
tools on RNA secondary structure prediction. 

Many different approaches have been proposed to solve the 
MSA problem. Dynamic programming, progressive, iterative, 
consistency and segment-based approaches are the most 
commonly used approaches [7]. Although many MSA 
algorithms are available, a solution has yet to been found that is 
applicable to all possible alignment situations [7]. 

It is well-known fact that the MSA problem can be solved 
by using the dynamic programming (DP) algorithm[8, 9]. 
Unfortunately, such an approach is notorious for its large 
consumption of processing time. DP methods with the sum-of- 
pairs score have been shown to be a NP-complete 
problem[10],[ll]. Algorithms that provide the optimal solution 
is time consuming and have a running time that grows 
exponentially with the increase in the number of sequences and 
their lengths. 

In essence, all widely used MSA tools seek an alignment 
with a high sum-of-pairs score. This optimization problem is 
NP-complete [2, 3] and thus motivates the research into 
heuristics. Over the last decade, the evolutionary and meta- 
heuristic approaches are one of the most recent approaches that 
have been used to solve the optimization problem. 
Evolutionary and meta-heuristic algorithms have been used in 
several problem domains, including science, commerce, and 
engineering. Consequently, most of the practical MSA 
algorithms are based on heuristics to obtain a reasonably 
accurate MSA within a moderate computational time and that 
which usually produces quasi-optimal alignment. Although 
many algorithms are now available, there is still room to 
improve its computational complexity, accuracy, and 
scalability. 

In this paper, a novel algorithm (HS-MSA), that is, a meta- 
heuristic technique known as harmony search algorithm, is 



proposed to solve the old MSA problem. The MSA problem is 
viewed as an optimization problem and can be resolved by 
adapting a harmony search algorithm. Since the search space in 
HS is wide, a modified algorithm is proposed (MHS-MSA) to 
find the conserved blocks using well-known regions, and then 
align the mismatch regions between the successive blocks to 
form a final alignment. HS-MSA is extended to include the 
divide-and-conquer (DCA) approach in which DCA is used to 
cut and combine the sub-sequence to form the final MSA. 
Another proposed technique is to use the harmony search 
algorithm as an MSA improver (HSI-MSA) in which the initial 
alignment can be obtained from the conventional algorithms or 
their combinations. HS-MSA can be extended to the parallel 
algorithm (PHS-MSA) in order to exploit the benefits of the 
multi-core and GPU system to reduce computational 
complexity and time. 

This paper is organized as follows: Section 2 reviews the 
related literature and describes the state-of-the-art MSA 
approaches. Section 3 explains the proposed algorithm. The 
evaluation and analysis methodology that is used to assess our 
proposed algorithm is explained in Section 4. Lastly, Section 5 
provides the conclusion and summary of the paper. 

II. Literature Review 

There are several MSA algorithms reported in the literature 
review. For a deeper understanding about the MSA algorithms, 
the basic concepts of MSA alignment representation, gap 
penalty, alignment scores, dataset benchmarks, MSA 
approaches, and harmony search algorithm need to be 
understood. As such subsection 2.1 briefly reviews the 
representation of MSA alignment followed by the details about 
gap penalty in subsection 2.2. The alignment scores, RNA 
datasets and benchmarks, and current MSA approaches are 
explained in subsections 2.3, 2.4 and 2.5 respectively. 
Subsection 2.6 provides a summary of the MSA algorithms and 
concludes with the harmony search algorithm in subsection 2.7. 

A Representation of MSA Alignment 

There are several ways to represent a multiple sequence 
alignment. Usually, the final sequences are an aligned listing of 
the entire sequence of one over the other. However, during the 
alignment process, it is helpful to represent the alignment of the 
sequences in a manner known as a representation. Some of the 
representations that have been used in previous algorithms 
include a bit matrix as used in[12], a matrix of gaps position as 
used in[13], multiple number-strings as used 
in[14],[15],[16],[17], string representation[18],[19],[20] as used 
in SAGA[18], four parallel chromosomes as used in[21], 
directed acyclic graph (DAG) as used in[22, 23], A-Bruijn 
graph as used in[24-26] , and dispersion Graph as used in[27]. 

B. Gaps Penalty 

A negative score or a penalty can be assigned to a set of 
gaps. Two types of gaps which were mentioned in the previous 
reviews [2 8] are defined as follows: 

Linear gap model - in this model a Gap is always given 
the same penalty wherever it is placed in the alignment. 
The penalty is proportional to the length of the gap and is 
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given by gap = nxgo, where go < is the opening penalty aligned residue pairs [3 6]. It has been used in PRIME[37], 

of a gap and n is the number of consecutive gaps. and ProbCons[38] algorithms. 



Affine gap model - in this model both the new gap and 
extension gap are not given the same penalty. The 
insertion of a new gap has a greater penalty than the 
extension of an existing gap and is given by gap = go + (n 
- 1) x ge, where go < is the gap opening penalty and ge 
< is the gap extension penalty and are such that |ge| < 
|go|. 

C. Alignment Score 

The MSA objective function is defined for assessing the 
alignment quality either explicitly or implicitly. An efficient 
algorithm is used to find the optimal or a near optimal 
alignment according to the objective function. Matches, 
mismatches, substitutions, insertions, and deletions need to be 
scored in the scoring function. The scoring function can be 
divided into two parts: substitution matrices and gap penalties. 
The former provides a numerical score for matches and 
mismatches while the latter allows for numerical quantification 
of insertions and deletions. All possible transitions between the 
20 amino acids, or the 4 nucleic acids are represented in a 
substitution matrix which is an array of two dimensions of 20 x 
20 for amino acid and 4 x 4 for nucleic acids. 

Usually a simple matrix used for DNA or RNA sequences 
involves assigning a positive value for a match and a negative 
value for a mismatch[20]. Meanwhile, the scores for protein 
aligned residues are given as log-odds [29] substitution matrices 
such as PAM[30], GONNET[31], or BLOSUM[32]. 

There are several models for assessing the score of a given 
MSA. Many MSA tools have adopted the score method. A 
brief review of the score method that has been used to calculate 
the alignment score is as follows: 

Sum-of-Pairs (SP): It was introduced by Carrillo and 
Lipman[10]. More details about the sum-of-Pairs will be 
presented later. 

Weighted sum-of-pairs score [3 3], [34]: The weighted sum- 
of-pairs (WSP) score is an extension of the SP score so 
that each pair-wise alignment score contributes differently 
to the whole score. 

Maximal expected accuracy (MEA)[35]: The basic idea of 
MEA is to maximize the expected number of "correctly" 



Consistency-based Scoring: This consistency concept was 
originally introduced by Gotoh [9] and later refined by 
Vingron and Argos[39]. Consistency-based scoring is used 
in T-Coffee[40], MAFFT[41], and Align-m[42] 
algorithms. 

Probabilistic consistency Scoring function: This scoring 
function is introduced in ProbCons[38]. It is a novel 
modification of the traditional sum-of-pairs scoring 
system. This promising idea is implemented and extended 
in the PECAN[43], MUMMALS[44], PROMALS[45], 
ProbAlign[46] , ProDA[47], and PicXAA[48] programs. 

Segment-to-segment objective function: It is used by 
DIALIGN[49] to construct an alignment through 
comparison of the whole segments of the sequences rather 
than the residue-to-residue comparison. 

NorMD[50] objective function: It is a conservation-based 
score which measures the mean distance between the 
similarities of the residue pairs at each alignment column. 
NorMD is used in RASCAL[51] and AQUA[52]. 

Muscle profile scoring function: MUSCLE [5 3] uses a 
scoring function which is defined for a pair of profile 
positions. In addition to PSP, MUSCLE uses a new profile 
function which is called the log-expectation (LE) score. 

D. RNA Database and Benchmarks 

Typically, a benchmark of reference alignments is used to 
validate the MSA program. The accurate score is given by 
comparing the aligned sequence (test sequences) produced by 
the program with the corresponding reference alignment. Most 
alignment programs have been extensively investigated for 
protein. To date, few attempts have been made to benchmark 
nucleic acid sequences. 

RNA reference alignments exist in several databases. It 
must be noted that although these databases provide a 
substantial amount of information to the specialist, they do 
differ in the file formats used and the data obtained. Herein, a 
brief review of the benchmarks and database that have been 
used for multiple RNA sequence alignment is explained in 
Table 1. 



TABLE I. 



Database and Benchmarks 



RNA Database 


Description 


Website 


Rfam[54]'[55] 


It is a compilation of alignment and covariance models including many 
regular non-coding RNA families [5 5] 


http://rfam.sanger.ac.uk/ 
http://rfam.ianelia.org/index.html. 


BRAliBase[56]'[57] 


It is a compilation of RNA reference alignments especially designed for the 
benchmark of RNA alignment methods [5 7]. 


http://www.biophvs.uni- 
duesseldorf.de/bralibase/ 
http ://proi ects .binf.ku.dk/pgardner/bralibase/ 


Comparative RNA Website 
(CRW)[58] 


It has alignments for rRNA (5S / 16S / 23 S), Group I Intron, Group II 
intron, and tRNA for various organisms[58] 


http://www.rna.ccbb.utexas.edu/ 


European Ribosomal RNA 
Database[59]'[60] 


It is a collection of all complete or nearly complete SSU (small subunit) and 
LSU (large subunit) ribosomal RNA sequences available from public 
sequence databases [60]. 


http://bioinformatics.psb.ugent.be/webtools/ 

rRNA/ 


The Ribonuclease P 
Database[61] 


It contains a collection of sequence alignments, RNase P sequences, three 
dimensional models, secondary structures, and accessory informational]. 


http://www.mbio.ncsu.edu/RnaseP/ 


5S Ribosomal RNA 


It is a collection of the large subunit of most organellar ribosomes and all 


http://biobases.ibch.poznan.pl/5SData/ 
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Database [62] 


cytoplasmic. This database is intended to provide information on nucleotide 
sequences of 5S rRNAs and their genes [62] . 




tmRNA[63] 


tmRNA (also known as lOSa RNA or SsrA) contains a compilation of 
sequences, alignments, secondary structures and other information. It shows 
secondary structure, together with careful documentation[63]. 


http : //www . indiana.edu/~tmrna/ 


The tmRDB( tmRNA 

database) [64] 


tmRDB provides aligned, secondary and tertiary structure of each tmRNA 
molecule. The alignment is available in several formats. 


http://www.ag.auburn.edu/mirror/tmRDB/ 


RNAdb[65]'[66] 


It provides sequences and annotations for tens of thousands of non-coding 
RNAs. 


http://research.imb.uq.edu.aU/rnadb/default.a 
spx 


Noncoding RNA (ncRNA) 
database [67] 


It provides information of the non-coding RNA sequences and functions of 
transcripts, (the non-coding RNA does not code for proteins, but performs 
regulatory roles in the cell) 


http://biobases.ibch.poznan.pl/ncRNA/ 



E. Current MSA Approaches 

Many research on MSA algorithms have been published in 
the last thirty years and reviewed by a few researchers such 
as [7], [68], [69], [70]. The published algorithms vary in the way 
the researchers choose the specified order to do the alignment, 
and in the procedure used to align and score the sequences. 
Existing algorithms can be classified into one or combinations 
of the following basic approaches: exact, progressive, iterative 
algorithms, group alignment, block-based, consistency-based, 
probabilistic, computational intelligence, and heuristic. The 
following subsections provide a brief overview of the 
consistency-based, block-based and heuristic optimization 
approaches. These approaches are related in one way or the 
other to our proposed work. The consistency-based approach 
is explained in subsection 2.5.1 followed by the block-based 
approach in subsection 2.5.2. Finally, the heuristic 
optimization approach is explained in subsection 2.5.4. 

1) Consistency-based Approach 
The "consistency-based" approach is one of the strategies 
that has been proposed to improve the MSA scoring function. 
This approach tries to reduce the chance of early errors when 
constructing the alignment instead of correcting the existing 
errors via post processing[40],[38]. This is typically achieved 
by improving the pair-wise sequence quality based on other 
sequences in the alignment so as to obtain pair-wise alignments 
that are consistent with one another. This consistency strategy 
was originally described by Gotoh[9] and later refined by 
Vingron and Argos[39]. This strategy has been modified by 
several methods since then. 

SAGA[18] incorporated the optimization of alignment with 
COFFEE based on a consistency measure called the 
consistence-based objective function. 

Later, Dialign2[71] represented the consistency-based 
method incorporating the segment-by- segment approach. 

Similarly, Align-m[42] used a local alignment as a guide to 
a global alignment non-progressive problem. Align-m used the 
pair-wise alignment consistency to find the parts that are 
consistent with each other. 

T-Coffee[40] also implemented this idea by using a 
consistency-based alignment measure based on a library of 
pair-wise alignments. This method was later brought into a 
probabilistic framework by ProbCons[38], MUMMALS[44], 
ProbAlign[46], PROMALS[45], and MSAProbs[72]. 

Nonetheless, a combination of different strategies can be 
used. For instance, PCMA[73] (profile consistency multiple 



sequence alignment) combined two different alignment 
strategies, that is, progressive and consistency approaches. 

2) Block-based Approach 

Block-based MSA is a method in which an alignment is 
constructed by first identifying the conserved regions into what 
is called "blocks". Then, the regions between the successive 
blocks are aligned to form a final alignment[74]. Block-based 
methods can be included in the consistency or probability- 
based^] approach. A block can be referred to a sub-sequence, 
a segment, a region, or a fragment[76]. A fragment is defined 
as pairs of ungapped segments of the input sequences [77]. A 
weight score is assigned to each possible fragment to find the 
consistent fragments with high overall sum of fragment scores. 
Those fragments are integrated from a pair-wise alignment into 
a multiple alignment. 

Searching for these conserver blocks in many blocked- 
based methods is very time-consuming. Therefore, the key 
issue is how to construct the possible set of blocks 
efficiently [75]. 

Some of the previous algorithms such as those undertaken 
by Boguski et al.,[78]; Miller,[79]; Miller et al.,[80] construct 
blocks either by pair-wise alignment or by those not matched 
by all the N sequences. Instead of starting from pair-wise 
alignments, Match-Box[81] aims to identify conserved blocks 
(or boxes) among the sequences without performing a pair- 
wise alignment. Similarly, Zhao and Jiang [74] introduced the 
BMA algorithm which allows for internal gaps and some 
degree of mismatch in the method used to identify the blocks. 

Based on a combination of local and global alignment, 
Dialign[71],[82],[83] involves an extensive use of the segment- 
by-segment methods. It combines the local and global 
alignment features by identifying and adding the conserve 
regions (block) shared between the sequences based on their 
consistency weights. 

Based on the anchored alignment, CHAOS [84] used fast 
local alignments as "seeds" for a slower global-alignment. 
CHAOS is used to improve DIALIGN[71] and LAGAN[85]. 

Recently, Wang et al.[75] produced a block-based 
algorithm called BlockMSA. It combined the biclustering and 
divide-and-conquer approaches to align the sequences. 

3) Heuristic Optimization Approaches 

Many optimization problems from various fields have been 
solved by using diverse optimization algorithms. 
Computational intelligence (CI) plays an important role in 
solving the sequence alignment problem. Recently, 
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Evolutionary Algorithms have the advantage of operating on 
several solutions simultaneously, combining an exploratory 
search through the solution space with the exploitation of 
current results [15]. There are no restrictions on the sequence 
numbers or their length. It is very flexible in optimizing the 
solution with low complexity. Many efforts have attempted to 
solve the MSA problem using evolutionary programming [8 6], 
[87]. Since MSA has computational difficulty, there is no best 
method that can solve MSA professionally. 

Heuristic optimization approaches include genetic 
algorithm, ant colony, swarm intelligence, simulating 
annealing, tabu search, and combinations thereof. In the 
following subsections, the several techniques of heuristic 
optimization approaches are explained to show how these 
techniques are applied to solve the MSA problems. 

a) Genetic Algorithm 

Genetic Algorithm (GA) is a heuristic search that performs 
an adaptive search to find optimal solutions of large-scale 
optimization problems with multiple local minima[15] using 
techniques that simulate natural evolution. 

GA is well suited for solving some NP-complete problems 
such as MSA. Sequence Alignment by Genetic Algorithm 
(SAGA)[18] is the earliest GA to be used to solve MSA 
problems. With the GA approach there are different methods 
that can be applied to solve the MSA problem such as the one 
usedin[13], [12],[17],[88],[19],[20]. 

Some methods are a hybrid with other approaches. Zhang 
and Wong[89] presented a method that used pair-wise dynamic 
programming (DP) technique based on GA. Similarly, utilizing 
GA in a progressive approach has been presented in[90]. Later, 
Wang and Lefkowitz[91] produced the GenAlignRefme 
algorithm which uses a genetic algorithm to improve local 
region alignment which leads to improving the overall quality 
of global multiple alignments. In[92] GA is used as an iterative 
method to refine the alignment score obtained by the 
progressive method. The use of GA to find the cut-off point in 
the divide-and-conquer approach is presented in[93]. Using 
similar combinations, a novel algorithm of genetic algorithm 
with ant colony optimization GA-ACO was presented by Lee et 
al.[94]. Chen et al.[95] reported a method which employs a 
new selection scheme to avoid premature convergence in GAs. 
Taheri and Zomaya[96] presented RBT-GA using a 
combination of the Rubber Band Technique (RBT) and the 
Genetic Algorithm (GA). Jeevitesh et al.[97] proposed the 
PASA algorithm which used the alignment outputs of two 
MSA programs - MCoffee and ProbCons - and combined 
them in a genetic algorithm model. 

b) ANT Colony 

Ant colony optimization algorithm (ACO) is a probabilistic 
technique for solving computational problems. It is one of the 
swarm intelligence families. The ACO algorithm is used as a 
new cooperative search algorithm in solving optimization 
problems. ACO was inspired from the observation of the 
activities of real ants [9 8], [99], [100]. Recently, ACO is used to 
solve the NP-complete problems. 
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It shows efficiency in solving the MSA problems such as 
those reported in[101],[102] where each proposed algorithm 
was based on the ant colony optimization and divide-and- 
conquer technique. Other researchers such 
as[103],[104],[27],[105] relied on the ant colony to solve the 
MSA problem in their research work. 



c) Particle Swarm Optimization 

Particle swarm optimization (PSO) is a swarm intelligence 
technique for numerical optimization. It simulates the 
behaviour of bird flocking or fish schooling. PSO was 
presented by Kennedy and Eberhart[106] in 1995. The 
simplicity of implementation, quick convergence, and few 
parameters have resulted in PSO gaining popularity. 

Many researchers have made modifications to the PSO idea 
and utilized this technique widely in solving MSA problems. 
Rasmussen and Krink[107] used a combination of particle 
swarm optimization and evolutionary algorithms to train 
HMMs for protein sequences alignment. Meanwhile, Pedro et 
al.[108] presented an algorithm based on PSO to improve a 
sequence alignment previously obtained using ClustalX. Juang 
and Su[109] produced an algorithm which combined the pair- 
wise DP and particle swarm optimization (PSO) to overcome 
the local optimum problems. Xu and Chen[110] designed an 
improved particle swarm optimization to solve MSA. Based on 
the idea of chaos optimization Lei et al.[l 11] produced chaotic 
PSO (CPSO) to solve MSA. A novel algorithm of mutation- 
based binary particle swarm optimization (M-BPSO) was 
presented by Hai-Xia et al.[l 12] for solving MSA. 

d) Simulated Annealing 

Simulated annealing (SA) was described by 
Kirkpatrick[113]. Simulated annealing is an algorithm that 
attempts to simulate the physical process of annealing. The 
basic concept of simulated annealing algorithms is based on 
observing the change of energy in which materials solidify 
from the liquid state to the solid state [1 14]. 

Several SA algorithms have been used to solve MSA 
problem. Kim et al.[115] used simulated annealing to develop 
the MS AS A algorithm for solving MSA. Uren et al,[116] 
presented MAUSA that used simulated annealing to perform a 
search through the space of possible guide trees. Meanwhile, 
Keith et al.[l 17] described a new algorithm for finding a 
consensus sequence by using the SA method. Omar et al.[l 18] 
produced a combination of Genetic Algorithm and Simulated 
Annealing to solve MSA problems. Roc[114] presented a 
method for multiple DNA sequence alignment in which an 
optimal cut-off point is chosen by the genetic simulated 
annealing (GSA) techniques. Joo et al.[l 19] presented a new 
method called MSACSA for MSA, which is based on the 
conformational space annealing (CSA). CSA combines three 
traditional global optimization methods, that is, SA, genetic 
algorithm (GA), and Monte Carlo with minimization (MCM). 

e) Tabu Search 

Tabu search is a meta-heuristic approach used to solve 
combinatorial optimization problems. Tabu search (TS) and 
simulated annealing are similar in that both traverse the 
solution space by testing mutations of an individual solution. 
However, they differ in the number of generated solutions. 
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While simulated annealing generates only one mutated 
solution, tabu search generates many mutated solutions and 
moves to the solution with the lowest energy of those 
generated. TS has been used to solve MSA problems. Riaz at 
el. [120] has implemented the adaptive memory features of tabu 
search to refine MSA. Lightner[121] used a tabu search 
approach to obtain multiple sequence alignment and explored 
iterative refinement techniques such as the hidden Markov 



(IJCSIS) International Journal of Computer Science and Information Security, 

Vol. 9, No. 2, 2011 
model and the intensification heuristic approach to further 
improve the alignment. 



F. Summary of Related Algorithms for MSA 

Table 2 lists the most current algorithms that are in use. 
This list is incomplete but includes the most related algorithms 
explained above. Online availability is the link to the online 
server or the site which can download and access the particular 
algorithm. 



TABLE II. 



Current MSA Algorithms 



Algorithm 


Approach 


RNA 


Online Availability 


Reference 


MAFFT 


Consistency 


Y 


http : //mafft . cbrc . i p/ali gnment/server/ 


[122] 


MUSCLE 


Progressive/ refinement 


Y 


http://www.ebi.ac.uk/Tools/msa/muscle/ 


[123] 


Dialign2 


Consistency/ segment 


Y 


http://bibiserv.techfak.uni-bielefeld.de/cgi-bin/dialign_submit 


[71] 


Align-m 


Consistency 


N 


http://bioinformatics.vub.ac.be/software/software.html 


[42] 


BlockMSA 


3 -way consistency/ 
Block/DCA 


Y 


http://aug.csres.utexas.edu/msa/ 


[75] 


MAUSA 


SA 


N 


http://eprints.utas.edu.au/208/ 


[116] 


SAGA 


Iterative/Stochastic/GA 


Y 


http://www.tcoffee.org/Projects_home_page/saga_home_page.html 


[18] 


Mishima 


k-tuple 


Y 


http://esper.lab.nig.ac.jp/study/mishima/ 


[124] 


MSAProbs 


Pair-HMM and partition function 


Y 


http://sourceforge.net/proiects/msaprobs/ 


[72] 


pecan 


Consistency/ progressive 


- 


http://www.ebi.ac.uk/~bjp/pecan/ 


[43] 


PicXAA 


posterior probability/ consistency 


Y 


http://www.ece.tamu.edu/~bjyoon/picxaa/ 


[48] 


PRIME 


GROUP-TO-GROUP/ ANCHOR 


Y 


http : //prime . cbrc .j p/ 


[37] 


ProAlign 


HMM/ progressive 


Y 


http://applications.lanevol.org/ProAlign/ 


[125] 


PROBCONS 


posterior probability 
pair-hmm 


N 


http ://probcons . stanford.edu/index.html 


[38] 


ProDA 


repeated and shuffled elements 


Y 


http ://proda. stanford.edu/ 


[47] 


Probalign 


posterior probabilities 


Y 


http://probalign.niit.edu/probalign/login 


[46] 


REFINER 


Refinement/ Block 


- 


ftp://ftp.ncbi.nih.gov/pub/REFINER 


[126]' 
[127] 


AIMSA 


Region 


- 


- 


[128] 


PRALINE 


Profile/iterative 
/progressive 


- 


http : //www . ibi . vu . nl/pro grams/praline www/ 


[129] 


T-COFFEE 


Consistency/ Progressive 


Y 


http://www.tcoffee.org/ 


[40] 


MUMMALS 


Probability HMM 


N 


http://prodata.swmed.edu/mummals/mummals.php 


[44] 


PROMALS 


k-mer/ Pair-HMM consistency 


Y 


http://prodata.swmed.edu/promals/promals.php 


[45] 


PCMA 


/c-mer/ Profile/consistency 


- 


ftp://iole.swmed.edu/pub/PCMA/pcma/ 


[73] 


BMA 


Conserve block 


Y 


- 


[74] 


GA-ACO 


GA and Ant colony 


- 


- 


[94] 


PASA 


Refine by GA 


- 


- 


[97] 



G. Harmony Search Algorithm 

Harmony search algorithm (HS) is developed by 
Geem[130]. HS is a meta-heuristic optimization algorithm 
based on music. 

HS simulates a team of musicians together trying to seek 
the best state of harmony. Each player generates a sound based 



on one of the three options (memory consideration, pitch 
adjustment, and random selection). This is the equivalent of 
finding the optimal solution in an optimization process. 

Geem et al.[130] models HS components into three 
quantitative optimization processes as follows: 
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The Harmony memory (HM): It is used to keep good 
harmonies. A harmony from HM is selected randomly 
based on the parameter called harmony memory 
considering (or accepting) rate, HMCR € [0,1]. It typically 
uses HMCR = 0.7 -0.95. 

The pitch adjustment: It is similar to a local search. It is 
used to generate a slightly different solution from the HM 
depending on the pitch- adjusting rate (PAR) values. PAR 
controls the degree of the adjustment by the pitch 
bandwidth (brange). It usually uses PAR = 0.1-0.5 in most 
applications. 

The random selection: A new harmony is generated 
randomly to increase the diversity of the solutions. The 
probability of randomization is Prandom =1- HMCR , and 
the actual probability of the pitch adjustment is Ppitch = 
HMCR x PAR. 

The pseudo code of the basic HS algorithm with these three 
components is summarized in Figure 2. 

Harmony Search Algorithm 

Begin 

Declare the objective function f(x), x =(xi,x 2 , ...,x n ) 
Initialize the harmony memory accepting rate (HMCR) 
Initialize pitch adjusting rate (PAR) and other parameters 
Initialize Harmony Memory with random harmonies 
While (t<max number of iterations ) 
If(rand<HMCR), 
Choose a value from HM 

If (rand<PAR), Adjust the value by adding certain amount 
End if 
Else choose a new random value 
End if 
End while 

Calculate the objective function 
Accept the new harmony (solution) if better 
Update HM 
End while 

Find the current best solution in HM 
End 

Figure 2 . Pseudo Code of the Harmony Search Algorithm[ 131] 

Later, Geem[132] proposed an ensemble harmony search 
(EHS) where a new ensemble consideration operation is added 
to the original HS structure. The new operation takes into 
account the relationship among the decision variables, and the 
value of each decision variable can be chosen based on the 
other variables. 

Thereafter, Mahdavi et al.[133] produced an improved 
harmony search (IHS), in which the parameter PAR and pitch 
bandwidth are adjusted dynamically in the improvisation step. 

So far, Omran and Mahdavi[134] have proposed a global- 
best harmony search (GHS) in which the performance of HS is 
improved by borrowing the concepts from swarm intelligence 
to modify the pitch- adjustment step such that the new harmony 
is assigned by the best harmony in the HM. 

Meanwhile, Pan at el. [135] produced a local-best harmony 
search algorithm with dynamic subpopulations (DLHS) for 
solving continuous optimization problems. The DLHS 
algorithm differs from the existing HS in that a whole harmony 
memory (HM) is divided into many sub-HMs and the 



(IJCSIS) International Journal of Computer Science and Information Security, 

Vol. 9, No. 2, 2011 
independent processes are performed in each sub-HM. A 
periodic regrouping schedule is used to exchange information 
between the sub-HMs, so that the population diversity and the 
improvement in the accuracy of the final solution are 
maintained. In addition, the parameters are adjusted using a 
new developed adaptive strategy to enable it to be used with a 
particular problem or phase of the search process. 



Recently, Zou at el. [136] proposed a novel algorithm 
known as a global harmony search algorithm (NGHS) to solve 
reliability problems. 

NGHS modifies the improvisation step of the HS. Position 
updating and genetic mutation are new operations included in 
NGHS. Position updating enables the worst harmony of HM to 
move toward the global best harmony rapidly while genetic 
mutation prevents NGHS from becoming trapped into the local 
optimum. 

III. The Proposed Algorithm 

Herein, in this article several algorithms are proposed to 
solve the MSA problem by using the adapted harmony search 
algorithm (HS). Adaptive HS for MSA is explained in the next 
subsection 3.1. A modified HS algorithm for reducing search 
space is explained in subsection 3.2. Subsection 3.3 describes 
the HS Improver. Finally, in subsection 3.4 a parallel HS-MSA 
is introduced which can be implemented in different parallel 
platforms such as the Multi-core and GPU. Figure 3 shows the 
stages of the proposed research framework. 
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Figure 3 . Research Framework. 
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A. Proposed Harmony Search Algorithm for MSA 

The main goal of the MSA algorithms is to detect and align 
the homologous regions across the different sequences. This is 
achieved by optimizing an objective function that measures the 
quality of the alignment. The harmony search is a new meta- 
heuristic optimization algorithm which has a history in solving 
NP-complete problems [137]. This subsection explains the 
ability of the harmony search algorithm in solving MSA 
problem. Herein alignment representation, objective function, 
harmony memory initialization, and adaptive harmony search 
algorithm for MSA are explained in greater details. 

1) Alignment Representation 

Alignment of N sequences with different lengths from L^ to 
L N , are represented as a matrix N x W where each row contains 
gap positions encoded for each sequence. The length of the 
rows in the matrix is W = [aLmax], where Lmax = max 
{Li,L 2 ,..,L N }, and [x] is the smallest integer greater than or 
equal to x, and the parameter a is a scaling factor[86]. The 
value a is chosen according to the probability distribution. The 
value of a can be 1.2 as used in[94] or 1.5 as used 
in[138],[13],[20]. The choice of 1.2 is to allow the aligned 
sequences to be 20% longer than the longest sequence. 
Meanwhile the selection of 1.5 is to allow the alignment to be 
50% longer than the longest sequence in the test as in [138]. 

2) Objective Function 
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To find the optimal solution in the HS-MSA, the sum-of- 
pairs (SP) score described in[139],[140],[10],[107] will be used 
to calculate the Objective Function (OF) where there is no prior 
knowledge of the reference alignment. The general form of the 
OF score of alignment n sequences which consists of M 
columns is: 



OF = ZU {SnCmO-GnCmO}, 

where S n (mi) is the similarity score of the column mi, 
G n (mi) is the gap penalty of the column mi and 1 is the 
sequence length. The similarity score of the column mi can be 
measured by the sum-of-pairs (SP). The SP-score S(mi) for the 

i-th column mi is calculated as follows: 

S(mi) = Ef=- 1 1 i:U= j+ is(m ) i ,mb, 

where m| is the j-th row in the i-th column. For aligning 
two residues x and y, the substitution matrix s(x,y) is used to 
give the similarity score. 

3) Harmony Memory Initialization 
For a given 5 sequences, the procedure to initialize the 
harmony memory is as follows: Maximum sequence length is 
MaxS = 7, minimum sequence length is MinS = 4, maximum 
length of alignment is W = [1.2 * 7] = 9, maximum gaps in 
sequence Si is (W - Li) where Li is the length of sequence i, 
maximum number of gaps is Gs = 9 - 4 = 5. 



Sequence 


Length 
Li 


Generate 

Gap 

Positions 

(W-L t ) 


Gap positions in Sort 

ascending 

(W-Li) 


A 


U 


c 


A 


A 






5 


4187 


1478 


U 


A 


A 


U 


C 


A 


A 


7 


32 


23 


A 


U 


C 


A 








4 


34789 


34789 


U 


A 


A 


U 


C 


A 


U 


7 


62 


26 


A 


U 


G 


A 


u 


U 




6 


729 


279 



A. Gaps Position 



- 


A 


U 


- 


C 


A 


- 


- 


A 


u 


- 


- 


A 


A 


U 


c 


A 


A 


A 


T 


- 


- 


C 


A 


- 


- 


- 


U 


- 


A 


A 


u 


- 


c 


A 


U 


A 


- 


U 


G 


A 


U 


- 


U 


- 



B. Aligned sequence 
Figure 4. Harmony memory initialization 



The initial harmony memory is randomly generated and the 
rows are initialized in the following way: First, a random 
permutation number W-Li of gap positions is generated from a 
range of values (1 - W) for each sequence Si with length Li. 
Second, those numbers (W-Li) are sorted and used to indicate 
where the corresponding gaps are placed in the matrix. Finally, 
the positions in the matrix rows which are not associated by 
gaps are filled with the base symbols taken from the original 
sequence. 

The random initialization procedure that produces the initial 
Harmony memory is illustrated in Figure 4. This is similar to 
the procedure used in [94]. The difference in our procedure is 
that the gap positions are generated and not the residue 



positions as in[94]. The generation gap positions are less than 
the generation residue positions for each sequence. The second 
difference is related to the first step in that the number of 
permutations are (W-Li) and not W as in[94]. 

4) Adaptive Harmony Search Algorithm for MSA (AHS- 
MSA) 

The purpose of AHS-MSA is to aid scientists in producing 
a high quality of MS As that may lead to a better RNA structure 
prediction (Figure 5) as well as other issues in molecular 
biology. To date in reviewing the approaches to solving the 
MSA problem or in predicting the multiple RNA secondary 
structure, we have found that no studies have incorporated the 
use of the harmony search algorithm. The only research that 
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has involved HS in bioinformatics is that of Mohsen et al.[141] sequence based on Minimum Free Energy, 
which predicted the secondary structure for a single RNA 
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Figure 5. The impact of MSA in RNA secondary structure prediction 



The HS algorithm has been successfully applied to several 
optimization problems [142]. As such this study aims to 
investigate the use and adaption of the HS algorithm in finding 
solutions to the MSA problems. The MSA problem can be 
considered as an optimization problem with minimal disruption 
of the accuracy, complexity, and speed rules. MSA can be 
resolved by adapting the harmony search algorithm. Moreover, 
HS possesses several advantages over conventional 
optimization techniques [143] such as: 

1. HS does not require initial value settings for decision 
variables; 

2. HS is a population-based meta-heuristic algorithm, which 
means that a group of multiple harmonies can be used 
simultaneously. Proper parallelism usually leads to better 
performance with higher efficiency and speed; 

3. HS uses stochastic random searches which explore the 
search space more widely and efficiently; 

4. HS does not need derivation information; 

5. HS is less sensitive to chosen parameters; 

6. HS can solve various NP-complete problems[137]; 

7. The structure of the HS algorithm is relatively easier; 

8. HS is a very successful meta-heuristic algorithm due to its 
way of handling intensification and diversification. 

9. HS is very versatile being able to combine with other 
meta-heuristic algorithms [134] 

These characteristics increase the reliability and flexibility 
of the HS algorithm in producing better solutions. 

The AHS-MSA algorithm as described in Figure 6 
combines and adapts the HS idea to solve the MSA problem. 
The steps of the AMS-MSA algorithm are as follows: 

1 . Initialize the harmony parameters (HMCR, PAR, NI, and 
HMS). 

2. Initialize the harmony memory with random harmonies by 
HMS solution. Each solution is an alignment. 

3. Calculate the objective function (OF) for each harmony. 

4. Improvise the new harmony. 

5. Accept/reject the new harmony 



6. Update the harmony memory. 
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Figure 6. The flowchart of the proposed HS-MSA algorithm 

B. A Modified Harmony Search Algorithm for MSA (MHS- 
MSA) 

To reduce the search space, a combination of methods is 
proposed. A hybrid method of HS and a segment-based 
approach is proposed and explained in the next subsection 
3.2.1. In subsection 3.2.2, a hybrid method of HS and a 
combination of segment-based and divide-and-conquer 
approaches are proposed and explained. 

3.2.1 A Harmony Search algorithm with a Segment-based 
Approach 

Lately identifying areas of local conservations before 
finding the global alignment is gaining popularity among 
researchers. Conserved regions can be a helpful guide in 
identifying the homology of sequences and assisting the 
process of MSA. This idea is not new and has been 
implemented in other algorithms such as DIALIGN[49], 
MLAGAN[85], CHAOS[84], align-m[42], and MAFFT[144] 
where blocks are first detected from the pair-wise sequence 
alignment and that information is then used to detect MSA. The 
other algorithm, such as MISHIMA[124], also used this idea in 
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which k-tuple is explored and analyzed from the original 
sequence. In the same way, well-aligned regions were seen in 
RASCAL[51],[128] where a consistency-based objective 
function called NorMD[50] was used. 

Herein, this proposed method in our research is to reduce 
the search space in the previous AHS-MSA algorithm by 
combining pair-wise alignments into multiple alignments. It 
works by finding the conserved blocks through all the 
sequences before starting the MSA process. It explores all 
possible regions, which is more correct and consistent. All 
matched blocks are used to guide the MSA alignment. The idea 
is first to detect the conserved blocks in the sequences pair- 
wise and then to apply HS to identify MSA from those 
conserved columns. 

The multiple alignment search space can be narrowed down 
to a number of possible regions per sequence pair. If parts of 
these residue pair are consistent within each other, they are 
considered as acceptable. For consistency it means that if 
symbol Ai (residue i of sequence A) is aligned correctly with 
symbol Bj , and Bj with C k , then Ai and C k should also be 
aligned. Therefore, this properly can be used to define the 
consistent parts among all the pair-wise alignments which can 
be considered as acceptable, and the gap positions can be 
defined at the rest of the aligned residue pairs. 

The ability to determine the well-aligned regions has at 
least two advantages. It prevents the same region from being 
changed in the later process. Additionally, it speeds up the 
optimization process. The modified steps of the HS-MSA 
algorithm can be summarized as follows: 

1 . Find all possible residue pairs in each sequence pair using 
the pair-wise algorithm. 

2. By using the consistency concept, find all possible blocks 
or columns that are acceptable. 

3. Calculate the score value for each block by using the sum- 
of-pairs objective function. 

4. Identify and analyze the potentially useful blocks, and 
select those that are more consistent with each other. 

5. Apply the HS algorithm to initialize the final alignment 
from these blocks and find the optimal alignment. 

3.2.2 A Harmony Search algorithm with Segment-based and 
Divide-and-conquer Approaches 
The previous proposed method can be extended where the 
divide-and-conquer (DAC)[145] method can be combined. 

Sammeth at el. [146], and Kryukov and Saitou[124] used 
the DC A approach in solving MSA. Kryukov and Saitou[124] 
produced the adapted DCA in which k-tuple is used to find the 
segments and align these segments by CLUSTALW and 
MAFFT. Sammeth at el. [146], on the other hand, integrated the 
global divide-and-conquer approach with the local segment- 
based approach as in DIALIGN. 

A set of consistent columns can form segments in the 
alignment. The DCA protocol is to cut the sequences at a point 
and repeat that cutting procedure until it is no longer exceeded. 
Then the obtained sub-sequences are aligned independently and 
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the results are combined to form a complete MSA alignment. 
The method proceeds as follows: 



1 . Find all possible residue pairs in each sequence pair using 
the pair-wise algorithm. 

2. By using the consistency concept, find all the possible 
blocks or columns that are acceptable. 

3. Calculate the score value for each column by using the 
sum-of-pairs objective function. 

4. Identify and analyze the potentially useful columns, and 
select those that are more consistent with each other. 

5. Add these conserve blocks/fragments to the fragments set 
F and they can be considered as cutting points. 

6. Divide the sequence into sub- sequence based on these 
cutting points. 

7. Apply the HS algorithm to construct the final alignment 
from these regions and find the optimal one. 

C. A Harmony Search Algorithm Improver for MSA (HSI- 
MSA) 

Another proposed method in our research work is the use of 
HSI-MSA to combine many multiple alignments into one 
improved alignment. Any conventional MSA program or a 
combination of them can initialize the Harmony memory. Then 
the Harmony algorithm can be applied as an iterative method to 
refine/combine the alignment to find the best alignment result. 
Here HS takes on the role of an improver of the accuracy of the 
current alignment. The goal of this study is to investigate 
whether this approach is going to improve the accuracy of the 
different alignments or not. This improver idea is similar to the 
PASA algorithm[97] which was used a genetic algorithm 
model to combine the alignment outputs of two MSA programs 
- M-Coffee and ProbCons. It has also been used in 
ComAlign[147], M-Coffee[148] and AQUA[52] . The 
proposed method can be summarized as follows: 

1 . Initialize the harmony memory by using well-known MSA 
algorithms including our alignment gained from the 
previous step. 

2. Calculate the score for each alignment. 

3. Apply the HS algorithm to improve and find the optimal 
alignment. 

This will combine all the alignment parts from the different 
alignments to find the optimal alignment within them and not 
just to select the best of them. 

D. A Parallel Harmony Search Algorithm for MSA (PHS- 
MSA) 

In addition to the foregoing proposed methods, another way 
to reduce the computational complexity and time consumed is 
to parallel the HS-MSA algorithm using multi-core and multi- 
GPU platforms. 

CUDA (Compute Unified Device Architecture) is an 
extension from C/C++ developed by NVIDIA to run 
thousands of threads parallelly[149] and to execute on the 
GPUs[150]. GPUs' architectures are "manycore" with 
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hundreds of cores [149]. 
streaming processor. 



GPUs were implemented as a 
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5S.B.actinobacteria), 16S (16S.B.fibrobacteres, 

16S.E.entamoebidae, 16S.E.perkinsea) ribosomal RNA. 



It is a good alternative for high performance computing and 
it will become even more excellent in the near future. 
Furthermore, availability, low price, and easy installation are 
the main advantages[151] of the GPUs compared to other 
architecture. 

Re-developing the algorithm and the data structure based 
on computer graphic concepts is the main obstacle facing the 
use of the GPUs[151],[152]. Moreover, other limitations are 
based on the streaming architecture which have to be taken into 
consideration (i.e. memory random access, cross fragment, 
persistent state) 

Many researchers have shown the design and 
implementation of bioinformatics algorithms using GPUs. 
Examples that use GPU to parallel sequence alignment 
algorithm in bioinformatics are[153], [154], [151], [155], [156], 
[157]. 

Our approach is motivated by the rapidly increasing power 
of GPU. Our proposed approach is to implement the proposed 
HS-MSA algorithm using NVIDIA's GPUs, to explore and 
develop high performance solutions for multiple sequence 
alignment. To program the GPU, the HS-MSA will be 
implemented in NVIDIA GeForce 9400 GT CUDA. The 
computation will be conducted on NVIDIA GPUs installed in a 
2.66 GHz intel Core 2 Quad CPU computer equipped with 3 
GB RAM, running on Microsoft Windows XP Professional. 

Moreover, to utilitize multiple CPU threads to incorporate 
GPU devices into one single program, the proposed method 
can be extended to use a hybrid multi-core and GPU codes by 
CUDA and OpenMP. This can lead to quicker implementation 
and greater efficiency on both GPU and multi-core CPU[158]. 

IV. Evaluation and Analysis 

To evaluate and analyse the performance of the proposed 
HS-MSA algorithm in greater depth there is a need for an 
objective criterion to assess the quality of the aligned 
sequences. The quality attained can be evaluated by comparing 
the results of the test alignment with the reference 
alignment^ 39]. 

The comparison can use some scores that may be dependent 
on the alignment itself (e.g, Sum-of-Pairs, Total Column Score) 
or independent from it (structure sensitivity and selectivity). 
This subsection describes in detail the benchmark dataset, the 
reference comparison, the alignment comparison and the 
structure comparison, which can be investigated to evaluate the 
test alignments. 

A Benchmark Dataset 

The proposed algorithm will be tested using the following 
datasets: Rfam, BRAliBase 2.1, Comparative RNA website 
(CRW), the Ribonuclease P database, 5S Ribosomal RNA 
database, tmRNA , tRNA , SRPDB, RNAdb, and ncRNA as 
explained in section 2.6. Different RNA datasets will be used 
from a variety of families and lengths such as 5S 
(5 S .B .alphaproteobacteria, 5 S .B .betaproteobacteria, 



B. Reference Comparison 

To assess the quality of the aligned sequence, it requires a 
reference alignment from the database benchmark. The 
comparison is between the test alignment and the reference 
alignment. 

Sum-of-pairs (SPS) and column Score (CS) are two 
different score functions that can be used to estimate this 
comparison. The SPS score is the percentage of the correct 
aligned residue pairs in the test alignment that occurred in the 
reference alignment^ 59]. The CS score is the percentage of the 
entire columns in the test alignment that occurred completely in 
the reference alignment[ 159]. 

In a given test alignment consisting of M columns, the ith 
column is denoted by Aii,A i2 , . . . ,A iN where N is the number 
of sequences. For each pair of residues Ay and A ik , pi(j,k) is 
defined such that Pi(j ? k) = 1 if residues A y and A^ from the test 
alignment are aligned with each other in the reference 



alignment, otherwise pi(j,k) = 
can be calculated as follows: 



0. The Score of the i th column 



si= ijiiSU^jPiak). 



Then, the sum-of-pairs score for a given test alignment can 
be calculated as follows: 

yM c. 

Sum-of-Pairs (SPS) = ngpS 

I i=1 s ri 

where Mr is the number of columns in the reference 
alignment and Sri is the score Si for the ith column in the 
reference alignment. 

Column score (CS): Using the same symbols as shown 
above, the score Ci of the ith column is equal to 1 if all the 
residues in that column are aligned in the reference alignment, 
otherwise it is equal to 0. Therefore, the column score is: 



CS= zf 



M Q 
1 i\/r 



To compare the test alignment with the corresponding 
reference alignment, the sum-of-pairs function and column 
score are used as described in[139],[107],[160],[161],[162]. 

C. Alignment Comparison 

This comparison is to evaluate the performance of the 
proposed algorithm with respect to the other MSA aligners. 
Typically, the MSA aligners are validated by using a 
benchmark data set of reference alignments. 

The Sum-of-pairs (SPS) and column scores (CS) of every 
produced alignment of each aligner program including our 
proposed algorithm are used to compare with the reference 
alignment. 

The proposed algorithm HS-MSA can be compared to the 
commonly used MSA programs on the above reference 
alignment benchmark. 
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D. Structure Comparison 

It might be expected that a more accurate alignment would 
lead to a more accurate RNA secondary structure. The 
proposed method is to investigate the impact of alignment 
accuracy on the accuracy of the RNA secondary structure using 
standard benchmarks and comparing them with the common 
well-known MSA algorithms. 

Both the alignment process and the prediction process can 
affect the accuracy of the secondary structure prediction, but 
here only the alignment process is investigated. 

The evaluation is performed in respect to sensitivity, 
selectivity or positive predictive value (PPV), and Mathews 
correlation coefficient (MCC) of the RNA secondary structure 
as used by Gardner and Giegerich[163]. The secondary 
structure of the test alignment produced by the proposed 
algorithm will be compared with that of others. The sensitivity 
and selectivity of the alignment process will be studied to 
investigate the effect of the proposed aligner on the accuracy of 
the structure as shown in Figure 7. 
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paper proposes a novel meta-heuristic method to solve the 
MSA problem. A meta-heuristic algorithm (HS-MSA), which 
has not been used up to now, is proposed for multiple sequence 
alignment that promises to greatly speed up the alignment 
process and improve its accuracy. The optimization method 
introduced herein is inspired by the so-called harmony search 
algorithm (HS). A new optimization algorithm for the 
combination of HS-MSA with segment-based multiple- 
alignment problem is also proposed and extended to include the 
parallel techniques. 
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Figure 7. Structure comparison 



V. Conclusion 



Multiple sequence alignment is a fundamental technique in 
many bioinformatics applications. Many algorithms have been 
developed to achieve optimal alignment. Some programs are 
exhaustive in nature; some are heuristic. Because exhaustive 
programs are not feasible in most cases, heuristic programs are 
commonly used. These include progressive, iterative, and 
block-based approaches. 

This paper describes briefly the basic concepts of MSA and 
reviews the common approaches in MSA. To this end, this 
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Abstract — The aim of this paper is to design a fuzzy logic 
controller- based model reference adaptive intelligent 
controller. It consists of fuzzy logic controller along with a 
conventional Model Reference Adaptive Control (MR AC). The 
idea is to control the plant by conventional model reference 
adaptive controller with a suitable single reference model, and 
at the same time control the plant by fuzzy logic controller. In 
the conventional MR AC scheme, the controller is designed to 
realize plant output converges to reference model output based 
on the plant which is linear. This scheme is for controlling 
linear plant effectively with unknown parameters. However, 
using MRAC to control the nonlinear system at real time is 
difficult. In this paper, it is proposed to incorporate a fuzzy 
logic controller (FLC) in MRAC to overcome the problem. The 
control input is given by the sum of the output of conventional 
MRAC and the output of fuzzy logic controller. The rules for 
the fuzzy logic controller are obtained from the conventional PI 
controller. The proposed fuzzy logic controller- based Model 
Reference Adaptive controller can significantly improve the 
system's behavior and force the system to follow the reference 
model and minimize the error between the model and plant 
output. 

Keywords-Model Reference Adaptive Controller (MRAC), 
Fuzzy Logic Controller (FLC), Proportional-Integral (PI) 
controller 

I. INTRODUCTION 

Model Reference Adaptive Control (MRAC) is one of 
the main schemes used in adaptive system. Recently MRAC 
has received considerable attention, and many new 
approaches have been applied to practical processes [1], [2]. 
In the MRAC scheme, the controller is designed to realize 
plant output converges to reference model output based on 
the assumption that plant can be linearized. Therefore this 
scheme is effective for controlling linear plants with 
unknown parameters. However, it may not assure for 
controlling nonlinear plants with unknown structure. It is 
well known that fuzzy technique has been widely used in 
many physical and engineering systems, especially for 
systems with incomplete plant information [3]-[8]. In 
addition to fuzzy logic, it has been widely applied to 
controller designs for nonlinear systems [9]-[13].A learning 
approach of combining MRAC with the use of fuzzy 
systems as reference models and controllers for control 
dynamical systems can be found in [14]. A hybrid approach 
by combing fuzzy controller and neural networks for 
learning-based control is proposed in [15]. A problem of 
Fuzzy- Approximation -Based adaptive control for a class of 
nonlinear time-delay systems with unknown nonlinearities 
and strict- feedback structure is discussed in [16]. An 
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Adaptive Network-Based Fuzzy Inference System (ANFIS) 
for speed and position estimation of permanent-magnet 
synchronous generator presented in [17]. An adaptive fuzzy 
output feedback control approach is proposed for Single- 
Input-Single-Output (SISO) nonlinear systems without the 
measurements of the states. It is discussed in [18]. Gadoue et 
al. presented a fuzzy logic adaptation mechanisms and it is 
used in model reference adaptive speed -estimation schemes 
that are based on rotor flux[19].An adaptive fuzzy-based 
dynamic feedback tracking controller will be developed for 
a large class of strict-feedback nonlinear systems involving 
plant uncertainties and external disturbances and it is 
discussed in [20]. Chang-Chun Hua et al. [21] presented an 
adaptive fuzzy-logic system and it is investigated for a class 
of uncertain nonlinear time-delay systems via dynamic 
output-feedback approach. A development of Adaptive 
Fuzzy Neural Network Control (AFNNC), including direct 
and indirect frameworks for an n-link robot manipulator, to 
achieve high-precision position tracking is discussed in [22]. 
An-Min Zou et al. [23] proposed a controller for the robust 
backstepping control of a class of nonlinear pure- feedback 
systems using fuzzy logic. A set of fuzzy controllers is 
synthesized to stabilize the nonlinear multiple time-delay 
large-scale system is presented in [24] 

In this paper a proposal of designing a fuzzy logic 
controller- based model reference adaptive intelligent 
controller is designed from a fuzzy logic controller in 
parallel with a MRAC. From the designed PI controller, 
fuzzy rules are generated and it is used to design a fuzzy 
logic controller. The fuzzy controller is connected in parallel 
with an MRAC and its output is added and then given to the 
plant input. The fuzzy logic controller is used to compensate 
the nonlinearity of the plant and it is not taken into 
consideration in the conventional MRAC. The role of 
MRAC is to perform the model matching for the uncertain 
linearized system to a given reference model. Finally to 
confirm the effectiveness of proposed method, it is 
compared with the simulation results of the conventional 
MRAC. 

II. STATEMENT OF THE PROBLEM 

To Consider a Single Input and Single Output (SISO), 
Linear Time Invariant (LTI) plant with strictly proper 
transfer function 



Zpis) 



(1) 



u p (s) 



where u p is the plant input and y p is the plant output .Also, 
the reference model is given by 
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y, n (s) . 



(2) 



where r and ym are the model's input and output. To define 
the output error as 

e = y P - y m ( 3 ) 

Now the objective is to design the control input u such as 
that the output error e goes to zero asymptotically for 
arbitrary initial condition, where the reference signal r(t) is 
piecewise continuous and uniformly bounded. 

III. STRUCTURE OF AN MRAC DESIGN 

A. Relative Degree n =1 

As in Ref [1] the following input and output filters are 
used, 

(b 1 =F(o l + gu p (4) 

cb 2 =Fco 2 + gy p 

where F is an ^ n ~ ' y n ~ l ) stable matrix such as that 

det ^ ' is a Hurwitz polynomial whose roots include 

the zeros of the reference model and that (F,g) is a 
controllable pair. It is defined as the "regressor" vector 

a = [af,a$,y p ,rf (5) 

In the standard adaptive control scheme, the control u is 
structured as 

u = 6 t (d (6) 

where ^ =[6> 1 ,6> 2 ,6> 3 ,C ] is a vector f adjustable 

parameters, and is considered as an estimate of a vector of 

unknown system parameters 9* . 

The dynamic of tracking error is 

e = G m (s)p*0 T a> (7) 

p * - L->— 
where m and v ; represents 

parameter error. Now in this case, since the transfer function 

between the parameter error " and the tracking error e is 
Strictly Positive Real (SPR) [1], the adaptation rule for the 
controller gain 9 is given by 

<9 = -I>^sgn(/) (8) 

where 1 is a positive gain. 

B. Relative Degree n =2 

In the standard adaptive control scheme, the control u is 
structured as 

T 

u = 6 T o) + e ®=6 T G)-6 T r ( f)e l sgn(K p /K m ) (9) 

where ^ =[6> 1 ,6> 2 ,6> 3 ,C ] is a vector f adjustable 
parameters, and is considered as an estimate of a vector of 

unknown system parameters ^ . 
The dynamic of tracking error is 
e = G m (s)(s+p )p*6 T <f) (10) 



where k » md e - 6(t) ~ e 

represents the parameter error. m\ )\ Po* [ s strictly 
proper and Strictly Positive Real (SPR). Now in this case, 
since the transfer function between the parameter error 



" and the tracking error e is Strictly Positive Real (SPR), 
[1] and the adaptation rule for the controller gain 9 is given 

= r^sgn( K p l K m ) (11) 

where el= yp-ym and ^ isa positive gain. 

The adaptive laws and control schemes developed are 
based on a plant model that is free from disturbances, noise 
and unmodelled dynamics. These schemes are to be 
implemented on actual plants that most likely to deviate 
from the plant models on which their design is based. An 
actual plant may be infinite in dimensions, nonlinear and its 
measured input and output may be corrupted by noise and 
external disturbances. It is shown by using conventional 
MRAC that adaptive scheme is designed for a disturbance- 
free plant model and may go unstable in the presence of 
small disturbances. 

IV. PI CONTROLLER-BASED MODEL REFERENCE 
ADAPTIVE CONTROLLER 

The disturbance and nonlinear component are added to 
the plant input of the conventional model reference adaptive 
controller, in this case the tracking error has not come to 
zero and the plant output is not tracked with the reference 
model plant output. The large amplitude of oscillations will 
come with the entire period of the plant output and the 
tracking error has not come to zero .The disturbance is 
considered as a random noise signal. To improve the system 
performance, the PI controller-based model reference 
adaptive controller is proposed. In this scheme, the 
controller is designed by using parallel combination of 
conventional MRAC system and PI controller. 

The transfer function of PI Controller is generally 
written in the "Parallel form" given (12) by or the "ideal 
form" given by (13) 



G P1 (S) = 



Upi(S) 
E(S) ' 



K i 
S 



= K P (l + -) 



(12) 
(13) 



where U pi (s) is the control signal, acting on the error signal 
E(s),K p is the proportional gain, K t is the integral gain and T t 
is the integral time constant. 

The block diagram of the PI controller-based model 
reference adaptive controller is shown in Fig. 1. 



JL 



CONTROLLER * 



'-Or 



Non linear VpW 

System 



Fig. 1 PI controller-based MRAC 

In the PI controller-based model reference adaptive 
controller, the value for the PI controller gains Kp and Ki 
are calculated by using the Ziegler-Nichols tuning method. 
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The control input U of the plant is given by the following 
equation, 



u = u mr +u n 



(14) 



u mr =e T co 

where Umr is the output of the adaptive controller and Upi 
is the output of the PI controller. The input of the PI 
controller is the error, in which the error is the difference 
between the plant output yp(t) and the reference model 
output ym(t). In this case also, the disturbance (random 
noise signal) and nonlinear component is added to the input 
of the plant .The PI controller- based model reference 
adaptive controller effectively reduces the amplitude of 
oscillations of the plant output. In this case the tracking error 
has not come to zero. The PI controller-based model 
reference adaptive controller improves the performance 
compared with the conventional MRAC. 

V. FUZZY LOGIC CONTROLLER-BASED MODEL 
REFERENCE ADAPTIVE CONTROLLER 

To make the system adaptable to more quickly and 
efficiently than conventional MRAC system and PI 
controller-based MRAC system, a new idea is proposed and 
implemented. The new idea which is proposed in this paper 
is the fuzzy logic controller- based model reference adaptive 
controller. In this scheme, the controller is designed by 
using parallel combination of conventional MRAC system 
and fuzzy logic controller. The error and the change in error 
are given input to the fuzzy logic controller. The rules and 
membership function of fuzzy logic controller are formed 
from the input and output waveforms of PI controller of 
designed PI controller based MRAC scheme. The block 
diagram of fuzzy logic controller-based model reference 
adaptive controller is shown in Fig. 2. 
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Fig. 2 Fuzzy logic controller-based MRAC system 

The state model of linear time invariant system is given 
by the following form 

X(t) = AX(t) + BU(t) (15) 

Y(t) = CX(t) + DU(t) 
This scheme is restricted to a case of Single Input Single 
Output (SISO) control, noting that the extension to Multiple 
Input Multiple Output (MIMO) is possible. To keep the 
plant output yp converges to the reference model output ym, 
it is synthesized to control input U by the following 
equation, 

U = U mr +U fc (16) 

where Umr is the output of the adaptive controller and Ufc 
is the output of the fuzzy logic controller 



U mr =6 1 co 

e =[e ^e 2 ,9 3 ,c ] T (17) 

CD = [cu l ,CD 2 ,y p ,r] T 

Stability of the system and adaptability are then achieved 
by an adaptive control law Umr tracking the system state x 
to a suitable reference model such as that the error e = yp- 
ym =0 asymptotically. The Fuzzy Logic Controller (FLC) 
provides an adaptive control for better system performance 
and solution for controlling nonlinear processes. 

The plant output is compared with the model reference 
output. After comparison, the error and the change in error 
are calculated and are given as input to the fuzzy controller. 

The error (e) and error change (ce) are defined as 

e(k) = y m (k)-y p (k) 
ce{k) = e{k)-e{k-\) 

where ym(k) is the response of the reference model at kth 
sampling interval, yp(k ) is the response of the plant output 
at kth sampling interval, e(k) is the error signal at kth 
sampling interval, ce(k) is the error change signal at kth 
sampling interval. 

FLC consists of three stages: fuzzification, rule 
execution, and defuzzification. In the first stage, the crisp 
variables e(kT) and ce(kT) are converted into fuzzy 
variables e and ce using the triangular membership 
functions. Each fuzzy variable is a member of the subsets 
with a degree of membership varying between '0' (non- 
member) and T (full member). In the second stage of the 
FLC, the fuzzy variables e and ce are processed by an 
inference engine that executes a set of control rules 
containing in a rule base. In this paper the control rules are 
formulated using the knowledge of the PI controller of 
designed PI controller-based MRAC system behavior and 
the experience of Control Engineers. The reverse of 
fuzzification is called defuzzification. The FLC produces the 
required output in a linguistic variable (fuzzy number). 
According to real-world requirements, the linguistic 
variables have to be transformed to crisp output. As the 
centroid method is considered to be the best well-known 
defuzzification method, it is utilized in the proposed method. 

A. Construction of Fuzzy Rules: 

Consider an example of a PI controller input (error), 
change in error and PI controller output waveforms are 
given by Fig. 3. 

By using the Fig.3, Fuzzy rules and membership for 
error (e) and change in error (ce) and output (Ufc ) are 
created 

The developed fuzzy rules are 

1. If error is 'A' and change in error is 'A' then the output is 

'D' 

2. If error is 'B' and change in error is 'B' then the output is 

'F 

3. If error is 'C and change in error is 'D' then the output is 

'H' 

4. If error is 'D' and change in error is 'F' then the output is 

'J' 

5. If error is 'E' and change in error is 'C then the output is 

A 
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6. If error is 'F' and change in error is T then the output is 

'K' 

7. If error is 'G' and change in error is 'C then the output is 

B 

8. If error is 'H' and change in error is 'H' then the output is 

T 

9. If error is T and change in error is 'C then the output is 

'C 

10. If error is 'J' and change in error is 'E' then the output is 

E 

11. If error is 'K' and change in error is 'G' then the output 

is 'G' 
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Fig. 3 PI controller input (error), change in error and 
PI controller output (Upi) 

The FLC has two inputs: error e(kT) and change in error 
ce(kT) and one output Ufc(kT). The membership functions 
for fuzzy variable error (e), change in error (ce) and output 
(Ufc) are shown in Fig.4. 




In this proposed fuzzy logic controller- based MRAC 
method, tracking error became zero within 6 seconds and no 
oscillation has occurred. The plant output has tracked with 
the reference model output. This method is better than 
conventional MRAC system and PI controller -based 
MRAC system 

VI. RESULTS AND DISCUSSION 

In this section, the results of computer simulations for 
conventional MRAC, PI controller-based MRAC and fuzzy 
logic controller-based MRAC system are reported. The 
results show the effectiveness of the proposed fuzzy logic 
controller-based MRAC scheme and reveal its performance 
superiority to the conventional MRAC technique. 

Example 1: 

In this example, the nonlinearity of backlash which is 
followed by linear system is shown in Fig. 5 



Fig. 4 (a) Membership functions of the fuzzy variables error (e), (b) change 
in error (ce), and output (Ufc) 



Fig. 5 Nonlinear System 

The disturbance (random noise signal) is also added to 
the input of the plant 

As an example, the system taken for the simulation is the 
Lateral Dynamic Model of a Boeing 747 airplane. 

The transfer function for the Lateral Dynamic Model of a 
Boeing 747 airplane System is given by 

_ -0.5s 3 -0.2608s 2 -0.1223s -0.05832 
~ s 4 + 0.6358s 3 + 0.9389s 2 +0.5116 + 0.003674 
and the reference model are given by, 

The simulation was carried out with MATLAB and the 
input is chosen as r(t)= 55sin0.7t.The initial value of the 
conventional MRAC scheme controller parameters are 
chosen as 6(0) = [0.5, 0, 0, 0]T . The conventional model 
reference adaptive controller is designed by using the 
equations (6) and (8). 

The simulations are done for the conventional MRAC, 
PI controller- based MRAC and fuzzy logic controller-based 
MRAC system with random noise disturbance and nonlinear 
component are added to the plant. 

In the PI controller-based model reference adaptive 
controller, the value of the PI controller gains Kp and Ki are 
equal to 10 and 75 respectively. In the fuzzy logic 
controller- based model reference adaptive controller, each 
universe of discourse is divided into six fuzzy sets: NH 
(Negative High), NL (Negative Large), ZE (Zero), PS 
(Positive Small), PM (Positive Medium) and PH (Positive 
High). 

The fuzzy variables e and ce are processed by an inference 
engine that executes a set of control rules which are 
contained in a (6x6) rule base as shown in Fig. 6. The control 
rules are formulated using the knowledge of the PI 
controller of designed PI controller based MRAC scheme 
behavior and the experience of Control Engineers. 
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NH NL ZE PS PM PH 



ex 
NH 

NL 
ZE 
PS 
PM 

PH 



Fig. 6 Fuzzy rules table 

The membership functions for fuzzy variable error (e), 
change in error (ce) and output (Ufc) are shown in Fig. 7 
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Fig. 7 Membership functions for fuzzy variable error (e), change in error 
(ce) and output (Ufc) 



The results for the conventional MRAC, PI controller- 
based MRAC and fuzzy logic controller -based MRAC 
system are given in Fig. 8 





8(e) 



8(a) 
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Fig. 8 Simulation results: 8(a). Plant output yp(t) (solid lines) and the 
Reference model output ym (t) (dotted lines) of the conventional MRAC 
system for the input r(t)= 55sin0.7t. 8(b). Plant output yp(t) (solid lines) and 
the Reference model output ym (t )(dotted lines) of the PI controller-based 
MRAC scheme for the input r(t)= 55sin0.7t. 8(c). Plant output yp(t) (solid 
lines) and the Reference model output ym (t ) (dotted lines) of the fuzzy 
logic controller-based MRAC scheme for the input r(t)= 55sin0.7t. 
8(d). Tracking error e for the conventional MRAC. 8 (e). Tracking error e for 
the PI controller-based MRAC scheme and 8(f) Tracking error e for the 
fuzzy logic controller -based MRAC scheme. 

Example 2: 

In this example, the nonlinearity of Dead zone is 
followed by linear system. The disturbance (random noise 
signal) is also added to the input of the plant. A second order 
system with the transfer function is given below 

1 



Fig. 9 Fuzzy rules table 



G(S) = - _ 

S 2 + 3S-10 

is used to study and the reference model is chosen as 

G M (S) = — 5 

S 2 +105 + 25 

The initial value of conventional MRAC scheme 

controller parameters are chosen as 6(0) = [3, 18,-8, 3]T. 

The conventional model reference adaptive controller is 

designed by using the equations (9) and (11). The simulation 

was carried out with MATLAB and the input is chosen as 

r(t)= 20+5sin4.9t. In the PI controller based model reference 

adaptive controller, the value for the PI controller gains Kp 

and Ki are equal to 8 and 85 respectively. 

In the fuzzy controller based model reference adaptive 
controller, seven linguistic variables are used for the input 
variable error and change in error. 

They are Extremely Negative (EN), High Negative 
(HN), Medium Negative (MN), Small Negative (SN), zero 
(ZE), Medium Positive (MP) and High Positive (HP). 

The seven linguistic variables are used for the output 
variable as Very Low(VL),Low(L),Nearly Low(NL), 
Medium(M), Medium High(MH),High(H) and Extremely 
positive(EP). 

The control rules are formulated using the knowledge of 
the PI controller of designed PI controller-based MRAC 
scheme behavior and the experience of Control Engineers. 
The fuzzy variables e and ce are processed by an inference 
engine that executes a set of control rules which are 
containing in a (7x7) rule base as shown in Fig. 9. The 
membership functions for fuzzy inputs error (e), change in 
error (ce) and fuzzy output (Ufc) are shown in Fig. 10. 



u.(ce) 




H(U fc ) 



Fig. 10 Fuzzy memberships used for simulation 



The results for the conventional MRAC, PI controller- 
based MRAC and fuzzy logic controller- based MRAC 
system are given in Fig .11. 
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Fig. 11 Simulation results: 1 1 (a) Plant output yp(t) (solid lines) and the 
Reference model output ym (t) (dotted lines) of the conventional MRAC 
system for the input r(t)= 20+5sin4.9t. 1 1(b) Plant output yp(t) (solid lines) 
and the Reference model output ym (t )(dotted lines) of the PI controller- 
based MRAC scheme for the input r(t)= 20+5sin4.9t. 11(c) Plant output 
yp(t) (solid lines) and the Reference model output ym (t ) (dotted lines) of 
the fuzzy logic controller-based MRAC scheme for the input r(t)= 
20+5sin4.9t. 11(d) Tracking error e for the conventional MRAC. 11(e) 
Tracking error e for the PI controller-based MRAC scheme. 1 1(f) Tracking 
error e for the fuzzy logic controller- based MRAC scheme. 



The nonlinear component and the disturbance (random 
noise signal) are added to the plant input of conventional 
MRAC. The plant output is not tracked with the reference 
model output and large amplitude of oscillations occur at the 
entire plant output signal as shown in Fig. 8(a) and 1 1(a) and 
also tracking error has not come to zero as shown in Fig. 
8(d) and 11(d). But when the disturbance (random noise 
signal) and non linear component are added to the input of 
the plant of PI controller-based model reference adaptive 
controller and it improves the performance comparing to the 
conventional MRAC and also reduces the amplitude of 
oscillations of the plant output as shown in Fig. 8(b) and 
11(b). In this case also plant output does not track the 
reference model output and the tracking error has not come 
to zero as shown in Fig. 8(e) and 11(e). When the 
disturbance (random noise signal) and nonlinear component 
are added to the input of the plant of the proposed fuzzy 
logic controller-based MRAC scheme, the plant output has 
tracked with the reference model output as shown in Fig. 
8(c) and ll(c).The tracking error becomes zero within 6 
seconds with less control effort as shown in Fig. 8(f) and 
1 1(f) and no oscillations has occurred. From the plots, one 
can see clearly that the transient performance, in terms of 
the tracking error and control signal, has been significantly 
improved by the proposed MRAC using fuzzy logic 
controller. The proposed fuzzy logic controller-based 
MRAC schemes show better control results compared to 
those by the conventional MRAC and PI controller -based 
MRAC system. On the contrary, the proposed method has 
much less error than conventional method in spite of 
nonlinearities and disturbance. 

VH. CONCLUSION 

In this section, the response of the conventional model 
reference adaptive controller is compared with the PI 
controller-based MRAC system and proposal model 
reference adaptive controller using fuzzy logic controller. 
The controller is checked with the two different plants. The 
proposed fuzzy logic controller -based MRAC controller 
shows very good tracking results when compared to the 
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conventional MRAC and the PI controller- based MRAC 
system. Simulations and analyses have shown that the 
transient performance can be substantially improved by 
proposed MRAC scheme and also the proposed controller 
shows very good tracking results when compared to 
conventional MRAC. Thus the proposed intelligent parallel 
controller is found to be extremely effective, efficient and 
useful 
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Abstract — Wireless Ad-hoc Network (MANET) is a special kind 
of network, where all of the nodes move in time. The topology of 
the network changes as the nodes are in the proximity of each 
other. MANET is generally self-configuring no stable 
infrastructure takes a place, where each node should help 
relaying packets of neighboring nodes using multi-hop routing 
mechanism. This mechanism is needed to reach far destination 
nodes to solve problem of dead communication. This multiple 
traffic "hops" within a wireless mesh network caused dilemma. 
Network that contain multiple hops become increasingly 
vulnerable to problems such as energy degradation and rapid 
increasing of overhead packets. In recent years, many routing 
protocols have been suggested to communicate between mobile 
nodes. One proposed routing approach is to use multiple paths 
and transmit clone of the packets on each path (i.e., path 
redundancy). Another more efficient routing protocol is to 
selective path redundancy from the multiple paths and sends 
packets on appropriate path. It can improve delivery efficiency 
and cut down network overhead, although it also increases 
processing delays on each layer. This paper provides a generic 
routing framework that immediately adapts the broken of 
established main route. The fresh generated route search process 
is taking place immediately if topology changing is initialized 
while data is being transmitted. This framework maintains the 
route paths which consist of selected active next neighbor nodes 
to participate in the main route. At the time which the main route 
is broken, the data transmission starts immediately thus data is 
transmitted continuously through the new route and the broken 
route is recovered by the route maintenance process. We conduct 
extensive simulation studies to shows that proposed routing 
protocol provides the backup route at the time when the main 
route is loss and analyzed the behavior of packets transmission. 
Using the framework, the average of successfully generated data 
transmission at various hops is kept 4.5% higher than the other 
network without implemented it with about 22% of overhead 
packets increase. Related with average network speed, the 
proposed protocol has successfully improved the successful data 
transmission 10.94% higher (at average network speed between 
10 and 40 km/h). In the future research, we will extend this 
framework in wide area of wireless network and compare it with 
other multipath routing protocols. 



Multi-hop; route path; connectivity; metric (key words) 



I. 



Introduction 



MANET consists of mobile nodes platforms which are free 
to move in the area. Node is referred to a mobile device which 
equipped with built-in wireless communications devices 
attached and has capability similar to autonomous router. The 
nodes can be located in or on airplanes, ships, cars, rooms, or 
on people as part of personal handheld devices, and there may 
be multiple hosts among them. The system may operate in 
isolation, or have gateways to a fixed network. Every node is 
autonomous. In the future operational mode, multiple coverage 
of the network is expected to operate as global "mobile 
network" connecting to legacy "fixed network". 

The network has several characteristics, e.g. dynamic 
topologies, bandwidth-constrained, energy - constrained 
operation, and limited physical security. These characteristics 
create a set of underlying assumptions and performance 
considerations for protocol design which extend beyond static 
topology of the fixed network. The design should reacts 
efficiently to topological changes and traffic demands while 
maintain effective routing in a mobile networking context. 

All nodes in MANET rely on batteries or other exhaustible 
energy modules for their energy. As a result of energy 
conservation or some other needs, nodes may stop transmitting 
and/or receiving for arbitrary time periods. A routing protocol 
should be able to accommodate such sleep periods without 
overly adverse consequences. Therefore, routing protocols for 
ad hoc network consider node mobility, stability and the 
reliability of data transmission. Broadcast is the dominant form 
of message delivery on the wireless network. Most of AODV 
protocol and its extensions use overhearing of broadcasted 
RREQ and RREP packets for discovering routes. 

In this paper, we provide a framework that immediately 
adapts the loss of established main route. The main route can 
be broken because of either death nodes or metric calculation 
requirements. The network should capable to generate backup 
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route search process immediately if topology changing is 
initialized while data is being transmitted. This framework 
takes care of the updated broken route which is selected active 
neighbor nodes to participate in the main route. At the time 
which the main route is broken, the broken route is recovered 
by the topology maintenance process then the data transmission 
starts immediately through the new route. It is expected to 
reduce the packet transmission delay by establishing the 
backup route while data is transmitted. We conduct extensive 
simulation studies to shows that proposed routing protocol 
provides the backup route at the time when the main route is 
broken off and analyzed the behavior of packets transmission. 
A comparison between similar network of Link State Routing 
and the generic framework is also conducted. Simulation 
results show that modified algorithms under different formation 
conditions are more efficient than the network without 
deployed that framework. The remainder of this paper is 
organized as follows: Section 2 gives preliminaries and our 
system model. Section 3 discusses the detail design of the 
simulation model, its notations, and assumptions. Simulation 
algorithm that suits mobile environment is presented in Section 
4. A performance evaluation of generic algorithm and 
comparison to a similar network of Link State Routing are 
presented in Section 5. Section 6 concludes the paper. 

II. Related Works 

Wireless network is generally set up with a centralized 
access point for provide high level of connectivity in certain 
area. The access point has knowledge of all devices in its area 
and routing to nodes is done in a table driven manner [1][2][5]. 
The Nemoto[2] introduced a technical review of wireless mesh 
network products that implemented IEEE802.il standard 
through installation of fixed wireless mesh network nodes. In 
terms of review the network performance at this stage, it will 
be represented as the view of use and evaluation of outdoors 
Muni-WiFi devices in accordance to applying the legacy LAN 
technology inside the corporate network. Performance of 
network access layer, i.e. performance of voice and TCP data 
transmission in terms of throughput, response time between 
mesh nodes, and communication delay in multi-hop 
transmission are presented. 

However, Nemoto[2] intended to operate in static topology 
network. With recent performance in computer and wireless 
communications technologies, advanced wireless mobile 
device is expected to see increasingly widespread use and 
application. The vision of future mobile ad hoc networking is 
to support robust and efficient operation in mobile wireless 
networks by incorporating routing functionality such that 
networks are capable to be dynamic, rapidly-changing with 
random, multi-hop topologies which are likely composed of 
relatively bandwidth-constrained wireless links. Supporting this 
form of host mobility requires address management, protocol 
interoperability enhancements and the likes. 

In this dynamic network, broadcasting plays a critical role 
especially in vehicular communication where a large number of 
nodes are moving and at the same time sending a large size of 
packet. In wireless network where nodes communicate with 
each other using broadcast messages, the broadcast 
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environment works as receivers collect information from all 
transmitting nodes within its coverage neighborhood, and then 
allowing receivers to aware of immediate surrounding respond 
before re-transmitting packet. Several transmissions may be 
redundant (overhead) during broadcast mechanism. These 
redundant causes the broadcast storm problem [8], in which 
redundant packets cause contention, collision, and consume a 
significant percentage of the available energy resources. Thus, 
routing protocols should be capable to respond these changes 
using minimum signaling and taking into account the energy as 
a parameter distributed in network. 



Routing is one of the key network protocols in 
telecommunication networks. It selects the paths for traffic to 
flow from all the sources to their final destinations. Between 
sources and final destinations, there are nodes, areas, and active 
traffic. There are proposals to allow flexible multipath routing 
in the Internet and single-path routing primarily uses where one 
user (source-final destination pair) uses only one selected path 
from the source to the destination, with the exception that 
traffic may split evenly among equal cost paths e.g., the current 
routing protocol within an AS, Open Shortest Path First 
(OSPF) protocol. 

In single-path routing protocols, route maintenance can be 
performed in concurrent with data transmission and take its 
role whenever routes fail or broken off. Therefore, data 
transmission will be stopped while the new route is established, 
causing data transmission delay. On the other hand, multipath 
routing protocols perform the route maintenance process even 
if only one route fails among the multiple routes. To perform 
the route maintenance process before all routes fail, the 
network must always maintain multiple routes. This can reduce 
data transmission delays caused by link failure. However, 
routing maintenance can lead to higher traffic of overhead. 
Several implementations of routing are based on AODV; 
typical examples are AOMDV, AODVM and AODV-BR 
protocols. 

The AODV-BR [10] protocol maintains the main route 
rules when it is broken by using the neighbor nodes around the 
routes to bypass the main route. At this protocol, neighbor 
nodes overhear the RREP packets for establishing and 
maintaining the backup routes during the route initiation 
process. If part of the main route is broken, nodes broadcast 
RRER packets to neighbor nodes. When neighbor nodes 
receive this packet, they establish an alternate route using 
information contained in overheard RREP packets previously. 

The AOMDV [7] protocol establishes link-disjoint paths in 
the network. When nodes receive the RREQ packet from the 
sender node, AOMDV protocol stores all RREQ packets. So, 
each node maintains a list of neighboring hops where RREQ 
packet contains information about neighbor node of the sender 
nodes. If first hop of received RREQ packet is duplicated from 
its own first hop, the RREQ packet is discarded. At the final 
destination, RREP packets are sent from each received RREQ 
packet. The multiple routes are made by RREP packets that 
follow the reverse routes to source node that have been set up 
already in intermediate nodes. 
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For the AODVM [9] protocol, the intermediate nodes 
record all received RREQ packets in routing table. They do not 
discard the duplicate RREQ packets. The final destination node 
sends an RREP for all the received RREQ packets. An 
intermediate node forwards a received RREP packet to the 
neighbor in the routing table to reach source node. Each node 
cannot participate in more than one route. 

III. Simulation Model, Notations, And Assumption 

In this paper, we propose framework of adaptive route 
protocol based on the AODV protocol and broadcast 
mechanism. AODV protocol is configured in the network with 
topology changed randomly because of the freely moving 
mobile nodes. In this circumstance, node failure occurs 
frequently. Therefore, AODV should capable to sense the path 
for nodes involved between source and final destination to 
prevent path breakthrough caused by node failure. This 
framework generates route search process immediately after 
the established main route is broken. It uses RREQ and RREP 
packets which are broadcasted to appropriate active neighbor 
nodes in order to incorporate in the main route on behalf of 
source-final destination path. Such this adaptive single hop 
routing may consume a lesser amount of energy in comparison 
to multi hop routing. In addition, this framework gets its 
advantage in the case transmission of larger packets where the 
fragmented packets should reach the final destination with 
higher successful transmission. 

The proposed framework assumes that nodes are capable of 
dynamically adjusting their relay nodes on per move step base. 
This behavior is almost similar to MANET routing protocols 
(e.g., AODV, DSR and TORA). One common property of 
these routing protocols is that they discover routes using 
broadcast flooding protocols whose value of distance metric in 
order to minimize the number of relay nodes between any 
source and final destination pair. 

A. The Model 

Simulation cover a single area of homogeneous nodes that 
communicate with each other using the broadcast services of 
IEEE 802.11. There are nodes with different roles simulated in 
this simulation, namely initiator node/source node, receiver 
node, sender node, destination node, and final destination node. 
Initiator node/source node is node that initiates transmission of 
packet. Packet can be either route discovery or data 
transmission. Like other nodes, initiator is always moving with 
random direction, speed, and distance. At the time it is moving, 
initiator node is always sensing its neighbor to maintain 
connectivity. Receiver node is node that can be reached by 
source/sender node. Nodes are defined as neighbors if it located 
within its distance radius range. At initial time, node senses its 
neighbors before packet data is required to be transmitted. 
Coverage neighbor nodes always receive packets that are 
broadcasted from sender. Destination node is selected receiver 
node in multi hop transmission that should relay packets to the 
next receiver node. Final destination node is node that became 
the end destination of packets. 

Wireless link channel is assumed to have no physical noise; 
i.e., the errors in packet reception due to fading and other 
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external interferences are not considered as a serious problem. 
Packets from sender to receiver will be transmitted as long as 
the bandwidth capacity is sufficient and the received signal to 
noise ratio (SNR) is above a certain minimum value. Thus each 
packet received is acknowledged at the link layer and de- 
encapsulate at the higher layer. Each node is capable of 
measuring the received SNR by analyzing overhead of packets. 
A constant bit error rate (BER) is defined for the whole 
network. Whenever a packet is going to be sent, a random 
number is generated and compared to the packet's CRC. If the 
random number is greater, the message is received, otherwise it 
is lost. The default value for the BER is 0, which means there is 
no packet loss due to physical link error. 



The layered concept of networking was developed to 
accommodate changes in local layer protocol mechanism. Each 
layer is responsible for a different function of the network. It 
will pass information up and down to the next subsequent layer 
as data is processed. Among the seven layers in the OSI 
reference model, the link layer, network layer, and transport 
layer are 3 main layers of network. The framework is 
configured in those layers. Genuine packets are initiated at 
Protocol layer, and then delivered sequentially to next layer as 
assumed that fragmented packets to be randomly distributed. 
Simulation models each layer owned with finite buffers. 
Limited buffer makes packets are queued up according to the 
drop tail queuing principle. When a node has packets to 
transmit, they are queued up provide the queue contains less 
than K elements (K > 1). To increase the randomization of the 
simulation process, simulation introduces some delay on some 
common processes in the network, like message transmission 
delay, processing delay, time out, etc. This behavior will result 
that at each instance of a simulation would produce different 
results. The packets exchanged between sender and receiver is 
of a fixed rate transmission X based on a Poisson distribution. 
Nodes that have packet queued are able to transmit it out using 
in each available bi-directional link channel. 

Energy is power kept in each node. The energy 
consumption required to transmit a packet between nodes A 
and B is similar to that energy required between nodes B and A 
if and only if the distance and the size of packet are same. The 
coverage distance range of the nodes is a perfect symmetric 



unit disk (omni -directional). If d x 



< 



x and y can see 



each other. This assumption may be acceptable in the condition 
that interference in both directions is similar in space and time; 
which is not always the case. Usually interference-free Media 
Access Control (MAC) protocol such as Channel Sense 
Multiple Access (CSMA) may exist. Heinzelman et al. 
assumed that the radio dissipates E e i ec = 50 nJ/bit to run the 
transmitter or receiver circuitry and £ amp =100 pJ/bit/m 2 for the 
transmit amplifier [5] [6]. The radio model is shown in the Fig. 
1 below. 




Figure 1: The radio model. 
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Thus, to transmit a k-bit message a distance d using this 
radio model, the radio expends: 

E TXBit (k,d) = E elect *k + £ amp *k*d 2 (1) 

and to receive this message, the radio expends: 

E RX (k) = E elect *k (2) 

The energy behaviors of node are defined as follow: 

• During the idle time, a node does not spend energy. 
Even though this assumption has been proven untrue 
because being idle might be as costly as receiving data, 
this is still an assumption that can be done in most 
experiments, since the most important factor is the 
overhead in terms of message exchange and its 
associated cost. 

• The nodes are assumed to have one radio for general 
messages. The main radio is used in all operations 
when the node is in active mode, and to send and 
receive control packets. When this radio is turned off, 
then no messages will be received and no energy will 
be used. 

• Energy distribution among nodes can either be constant 
value, normally distributed, Poisson distributed, or 
uniformly distributed. 

B. Immediate Awareness Routing Algorithm 

The core algorithm is developed from static mode (e.g., 
sensor networks). The enhancement for serving mobility then 
detailed in support of topology development and routing 
maintenance. We show our methodology on a tree network. 
The tree topology decomposes the paths between source and 
final destination into several route paths. The algorithm 
underestimates the interference among the route paths. The 
algorithm starts to operate with the network topology 
development. The routing maintenance is responsible to sense 
the broken of the main route path during data transmission. 

Network topology is initiated using broadcast mechanism 
and propagated through node-to-node based on routing metrics 
approach. During propagation, it takes into account all 
topology development, route discovery, and data transmission. 
Each source injects single big packet which fragmented into 
multiple packets in the network, which traverse through the 
network until reach the final destination. Packets, which are 
waited for an opportunity to be transmitted, are queued at each 
node in its path. This model is not only applicable in direct 
communication (one hop transmission) but it can also work in 
multi-hop transmission. In this situation, when the source and 
final destination nodes are located outside the maximum 
transmission range, source node is capable to discover multiple 
hops routing while keep the data being transmitted. 

Topology development is proactive; it discovers and 
disseminates link state information. It involves transmit and 
receives of HELLO packets, REPLY packets, CONFIRM 
packets, and so on; mostly redundant. These packets which 
successfully received by link layer, will update an entry in the 
neighbor table which cache information about surrounding 
nodes exists. HELLO packets and corresponding REPLY s have 
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contents of [ID, hop, energy, time, throughput, direction], 
where ID is a unique neighbor node (IP address), hop is a 
number which increment each time packet reach at relay node, 
energy is current available energy level needed to ensure the 
communication with the neighbor node, time is current time at 
which this event is executed, throughput is total of bits that can 
be pushed through this available link having bandwidth and 
latency, and direction is the way node will move to reach its 
distance. 



The routing maintenance is responsible for performing the 
route optimization operation that leads to the discovery of 
routes changes. The algorithm performs two basic operations: 
initiate packets, which compute whether a route optimization 
between two nodes is needed and sets up broadcast mechanism; 
and determine when to transmit routing maintenance packets. 
The framework optimizes routes through sequence of steps to 
converge to an optimum route. 

When a node first starts, it only knows of its immediate 
neighbors, and the direct cost involved in reaching them. (This 
information, the list of destinations, the total cost to each, and 
the next hop to send data to get there, makes up the routing 
table, or distance table.) Each node, on a regular basis, sends 
broadcast packets to neighbors to get all costs of destinations. 
The neighboring node(s) examine this information, and 
compare it to what they already know, thus update their own 
routing table(s). Over time, all the nodes in the network will 
discover the best next hop for all destinations, and the best total 
cost. When one of the nodes involved are changed, those nodes 
which used it as their next hop for certain destinations discard 
those entries, and create new routing-table information. They 
then pass this information to all adjacent nodes, which then 
repeat the process. All the nodes in the network receive the 
updated information, and discover new paths to all the 
destinations which they can still reach. 

During this sequence, relay node is determined by relevant 
information gathered from neighbor nodes. After omitted 
redundant packets and based on calculation metric value, relay 
node is set (i.e., a small set of nodes that potentially forward 
the broadcast packet) to achieve high delivery ratio with certain 
metric consideration. It means that only selected neighbors able 
to forward the packet to the next neighbors. The selected 
neighbor or new relays added to a route during iteration are 
very much dependent on the relay found in the previous 
iteration. This set can be selected dynamically (based on both 
topology and broadcast state information). In order to simulate 
this proposed routing, the relay node set forms a connected 
dominating set (CDS) and achieves full coverage of connected 
network. It is possible that the first iteration, which seemed as 
most optimum value of metric value is not the route achieving 
the optimum topology with optimum delay path. 

Several relay nodes may exist between source and final 
destination, thus source node must choose the one providing a 
highest metric value in the path lead to final destination. 
Multiple packets are sent to that single (next) relay node. 
Transmission of multiple route-redirect packets will waste 
bandwidth and network resources (overhead packets 
increased). For sparsely populated networks, this may not be a 
problem. However, this is an issue in the case of densely 
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populated networks where several potential nodes can be 
chosen. [4] The simulation creates dense environment. Densely 
populated nodes are desired to make alternate routing possible. 

Routing maintenance is part of the framework that 
addresses this immediate awareness path change by giving 
priority for the execution of an update routing maintenance 
packet to the potential neighbor node that computes highest 
route metric energy-distance values first. After receiving an 
update routing maintenance packet, a node modifies its routing 
table, putting the source of the received packet as the next hop 
node for the specific sender-destination route path. To execute 
preferential event in sequentially distributed events, we apply a 
different time-event execution after the triggering event takes 
place. The lower and upper bound of the queuing interval are 
set such that events do not interfere with predefined timers used 
by the other events for layers and modification events. 

The proposed scheme for routing maintenance is as follow. 
First, when main route failure is detected, the RouteERROR 
packet sent back to a source and nodes participating in the path 
to allow detecting the disconnection of the main route. When 
the node receives the RouteERROR packet it checks the level 
flag in the routing table and determines whether it belongs to 
stay near or far from first relay of the main route. After 
received RouteERROR packet, the closest node reinitiates the 
route discovery process for the main route, and at the same 
time keeps the packets (already) received and reconfigures its 
path configuration. The dying node (i.e. node caused the route 
path breakthrough) stops to receive new packets. It has 
responsibility to transmit packets (already) received to 
destination node before steady silent (and OFF). Immediately 
after the breakthrough path is successfully re-connected, the 
closest node starts data transmission through the backup route. 

In AOMDV and AODVM, data transmission is started after 
the path is found. [4] It cause overhead at the first route 
discovery and delay the first data transmission. The proposed 
framework solved these problems by starting a data 
transmission immediately after route discovery process starts at 
some interval of initialTime. To establish a main route, a 
source node broadcasts an HELLO packet with the level value 
of zero to neighbor nodes. When intermediate nodes receive 
the packet, they store the level value and information about the 
source node in the neighbor table. Neighbor nodes transmit the 
corresponding REPLY packet, which is sent back to the source 
node along with information owned through the reverse path. 
Intermediate nodes that receive the REPLY packet increment 
the level value in the neighboring table. By incrementing the 
level value, the protocol ensures that a node will be used as 
(considerably) the selected route paths. When a source node 
receives the REPLY packet, the main route is established. 
Source node then broadcast confirmation packets about this 
selection to neighbor nodes again. Each source node does 
broadcasts HELLO packets with the certain level value to 
surrounding nodes. Consequently, nodes belonging to the main 
route keep different level values. Nodes belonging to the main 
route always have a level value one higher if located under 
several relays from source node. A value of zero for level flag 
indicates the source node of main route, and a value of one 
indicates the next relay in the main route. 
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After two hops iterations, the source node starts data 
transmission. When receiver receives a packet data from other 
nodes, it de-encapsulates the packet, check packet's 
destination, and searches the routing table to see if a route 
toward the destination node may exist. If this is not the case, 
the node searches the neighbor table to see if information 
regarding the destination node is available. If this is not the 
case, the node will give up and makes information about this to 
its gateway. Otherwise, the node will process the received 
packet. The iteration will follows as described previously. 
When nodes are mobile and no data packets are available for 
transmission, a source node required to transmit explicit 
signaling packets to maintain a topology. 




(b) 

Figure 2. Route path maintenance steps, (a) At the time path is broken off. (b) 
The re-paired path (backup route) is established. 

Fig. 2 shows the example that the route is maintained when 
a new source node SC performs the route discovery process to 
the destination node FD as the final destination node of source 
node SC (a route is already established between source node 
SC and final destination node FD). A main route (SC — >1— > 
2— > 3— > 4— > FD) between SC and FD is disconnected by the 
recently, then the backup route is established (SC— > 1— ► a— > 
b^ 3^ 4^ FD) between SC and FD. 

We built a JAVA network simulator to evaluate this 
framework. The simulator supports physical, link and network 
layers for single/multi hop ad-hoc networks. We assume that 
IEEE 802.11 Distributed Coordination Function (DCF) or 
MAC protocol which uses Channel Sense Multiple Access with 
Collision Avoidance (CSMA/CA) already deployed. 
Successfully received packet by receiver's interface is packet 
whose SNR is above a certain minimum value otherwise the 
packet cannot be distinguished from background 
noise/interference. Packets are transmitting through physical 
layer in accordance with Poisson distribution. Communication 
between two nodes in IEEE 802.11 uses RTS-CTS signaling 
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before the actual data transmission takes place. Simulation 
simulates this with random hearing to link's condition. The 
simulator uses two-steps propagation model to simulate 
interactive propagation in the operation of the protocol in 
dynamic environment. The propagation model is appropriate 
for outdoor environments where a line of sight communication 
existed between the transmitter and receiver nodes and when 
the antennas are omni-directional. 

The packets are simulated either fragmented or not 
fragmented, flow through layers at every time- slot. The length 
of the active periods (denoted by random variable) is 
distributed randomly according to Mersenne Twister algorithm. 
The mean of transmission rate and arrival rate of packets can 
be controlled by changing the value of "p" (a Poisson 
distribution value). The arrival process is defined as the arrival 
packets stream at each node is a series of active and idle 
periods. The received packet is then processed by the layering 
module with the result that one of the following actions is 
taken: (i) the packet is passed to the higher layers if both MAC 
and IP addresses match; (ii) the packet is dropped if neither 
MAC nor IP addresses match; or (iii) the packet is forwarded to 
another node when only the MAC address matches. In the latter 
case, it searches the routing table to find the next route node 
with the higher metric calculation to reach next destination 
node. 

IV . Performance Evaluation 

Our simulation modeled a network of 50 nodes placed 
randomly with a uniform distribution within an area of 300 X 
300 meter square. Each node randomly selects a new position 
and moves towards that location with a certain speed. The 
average network speed is selected from value between 5 and 
50m/s respectively. Once nodes reach the position, they 
become stationary for a predefined pause time and then select 
another position after a delay. This process continues until the 
end of simulation. The sources were determined, while final 
destination nodes were selected randomly over the network. 
Traffic was modeled using CBR (constant-bit-rate) sources 
with 1500-byte data packets and a traffic rate of Poisson 
distribution value at five packets per second is selected. 
Scenarios for simulation are batched with variables of number 
initiators/sources and speed. We compare the framework and 
similar LSR network to best understand the various tradeoffs 
and limitations of the algorithm. The similar LSR network is 
selected because it is simple to deploy and can be used for 
analyzing a large scale of packets processes using known 
network topology. 

A similar (LSR) network would generate full routing tables 
in advance where, all nodes in the network would be aware of 
distance level and routes to all other nodes in the network. This 
network can compute the optimum metric with shortest 
distance to a next relay node by listening replies of topology 
construction and topology maintenance packets transmitted by 
the neighbors. This network operation requires each node in the 
network to broadcast a routing packet. The broadcast packets 
contain information about the distance metric of all known 
destinations. Each node floods the network with information 
about what other nodes it can connect to, and the received 
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packets may require to be forwarded by other nodes to 
propagate the entire network. After collecting packets from all 
nodes of the network, any node should be capable of 
computing optimum routes to any other node in the network. 
Each node then independently assembles this information into 
a tree. Using this tree, each node then independently 
determines the least-cost path from itself to every other node 
using a standard shortest paths (distance) algorithm. The 
iteration of propagation events to be entirely flooded mainly 
depends on the density of nodes in the network. The result is a 
tree rooted at the source node such that the path through the 
tree from the root to any other node is the least-cost path to that 
node. This tree then serves to construct the routing table, which 
specifies the best next hop to get from the current node to any 
other node. 



Measurements of the experiment comprise the successful 
data transmission rate from source to destination nodes and the 
control packet overhead for route discovery and route 
maintenance. The graphs represent the results of experiments 
for various pause times. 

Successful packet transmission rates indicate that the 
destination node received all packets sent from the source node. 
Using the framework, there is improvement of successful data 
transmission about 4.5% higher than the network without 
implement it. The successful packet transmission rate is shown 
in Fig. 3. 

The proposed protocol provides higher data transmission 
rates than AODV protocols. When the route fails in the AODV 
protocol, the protocol performs the route discovery process 
again from the source node. In this research, routes are repaired 
from intermediate nodes (connected to the failed link) which 
participating in the path leads to the destination node. The 
proposed protocol has a higher packet transmission rate than 
AODV protocol (because the proposed protocol can reduce the 
packet loss rate that occurs during the route research process) 
and need to wait at short delay for the route to be reinitiated. 
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Figure 3. The successful packet transmission rates. 
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Figure 4. Establishment of backup route in data transmission at different 
network speed. 

Fig. 4 shows the comparison of the successful data 
transmission at different speed when the main route is broken 
between the networks with implement the framework and the 
other without implemented it. As a result, proposed protocol 
has successfully improved the successful data transmission (or 
backup the main route) 10.94% higher. 

When the main route in network is broken off, the proposed 
protocol finds the new route by starting a route discovery 
process at the closest victim node and delays data transmission 
for a while. At this time, it causes the routing overhead of main 
route and backup route discovery processes. Control packets 
are packets used for establishing routes. In addition, data 
packets indicate the actual packets used for data transmission. 
Routing overheads is shown in Fig. 5. About 22% increase of 
overhead packets at the network which implement the routing 
framework. 
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interesting to note that the routing policy, which was designed 
primarily for achieving higher successful data transmission in 
the single wireless network area, can also be engineered to 
achieve good delay performance in multiple wireless network 
area. In the future research, we will simulate this framework in 
wide area of wireless network and compare it with other 
multipath routing protocols such as AOMDV and AODVM. 
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V. Conclusion And Future Work 

In this paper, we proposed a routing protocol that 
establishes routes which is capable to adapt the broken off path 
between source and final destination nodes based on the 
AODV protocol for MANETs. The new protocol has not too 
high overhead to conventional AODV protocol. Also this 
protocol sends the data immediately after the main route is 
successfully recovered to reduce he data transmission delay. 
During execution, besides discovering the backup routes when 
the main route is broken off, the framework always maintains 
the route using the topology maintenance process. The main 
difficulty however is in identifying the bottlenecks in the 
network. The result obtained in this simulation is compared 
against the similar LSR network with AODV protocol. It is 
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Abstract- Some real life data are associated with duration of 
events instead of point events. The most common example of such 
data is data of cellular industry where each transaction is 
associated with a time interval. Mining maximal fuzzy intervals 
from such data allows the user to group the transactions with 
similar behavior together. Earlier works were devoted to mining 
frequent as well as maximal frequent non-fuzzy intervals. We 
propose here a method of mining maximal dense fuzzy intervals 
where density of an interval quite similar to the frequency of an 
interval. 

Keywords- Frequent intervals, Maximal frequent intervals. Density 
of a fuzzy interval, Minimum density, Contribution (vote) of a 
transaction on a fuzzy interval, join of two fuzzy intervals. 



I INTRODUCTION 

Among the various types of data mining applications, analysis 
of transactional data has been considered important. One 
important extension of this mining problem is to include a 
temporal dimension. Most of the earlier works done in this area 
do not take into account the time factor. By taking into account 
the time aspect, more interesting patterns that are time dependent 
can be extracted. Recently data mining in temporal data sets has 
arisen as an important data mining problem [[2], [10]]. 

Many real life problems are associated with duration events 
instead of point events. In this paper we are considering such 
datasets i.e. dataset having time intervals. Such datasets are 
called as temporal interval datasets. A record in such data 
typically consists of the starting time and ending time (or the 
length of the transaction) in addition to other fields. In [5] an 
algorithm for mining maximal frequent intervals from such data 
sets has been given 

In practice however most of the time people make statements 
using vague terms like the early morning, late evening etc 
instead of mentioning strict time intervals. There is no strict 
boundary for separating early morning from morning. To 
represent such vague terms, fuzzy sets are required. In this paper 
we discuss the problem of mining dense intervals using a fuzzy 
concept. The objective of this paper is three fold. First we 
propose the definition of density of a fuzzy interval over a 
transactional (where each transaction is associated with a time 
duration) dataset. Secondly, we propose to define a join 
operation on the fuzzy intervals and lastly we propose an 



algorithm to mine maximal dense fuzzy intervals. In such cases, 
we define the amount of contribution (also called vote) of a 
transaction t associated with time interval [t b t 2 ] for a given 
fuzzy interval A as the ratio of the area bounded by the 
membership function A(x) (associated with the fuzzy interval) 
and the real line included within the interval [t b t 2 ] to the total 
area covered by A(x) and the real line. If the total average of the 
votes of all the transactions in a fuzzy interval A exceeds a pre- 
defined threshold, then the fuzzy interval is called a dense fuzzy 
interval. Similarly a dense fuzzy interval will be maximal if no 
dense fuzzy interval contains it. The well-known A-priori 
algorithm cannot be used here directly as the downward and 
upward closure property of frequent sets does not hold in this 
case (it is proved with an example). We propose a variation of 
the A-priori algorithm that works in this situation and gives us 
the maximal dense fuzzy intervals. 



II. RELATED WORKS 

One of the very useful extensions of conventional data mining 
is temporal data mining. In recent times it has been able to attract 
a lot of researcher to work in this area. Considering the time 
dimension in the conventional data mining problem, more 
interesting patterns can be extracted that are time dependent. 
There are mainly two broad directions of temporal data mining 
[7]. One concerns the discovery of causal relationships among 
temporally oriented events. Ordered events from sequences and 
the cause of an event always occur before it. The other concerns 
the discovery of similar patterns within the same time sequence 
or among different time sequences. The underlying problem is to 
find frequent sequential pattern in the temporal databases. 

Wong et al [9] introduced the fuzzy concept into the 
association rule mining to deal with quantitative attributes. 
Quantitative attributes are normally handled by partitioning the 
attribute domains and then combining adjacent partitions [8]. 
Although this method can solve problems introduced by finite 
domain, it causes the sharp boundary problem. To soften the 
affect of soft boundaries, fuzzy sets are used. Here each 
quantitative attribute is associated with several fuzzy sets. A 
fuzzy association rule looks like if X is A then Y is B, where X 
and Y are attributes and A and B are fuzzy sets which describe X 
and Y respectively. Prade et al [6] defined support and 
confidence of a fuzzy association rule. 
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In [2], Rossi and Ale extended the well-known A-priori 
algorithm for mining association rules to temporal data and 
described a technique to find interesting patterns on the data that 
are time bounded. 

In [5], the problem of mining maximal frequent intervals is 
discussed. They define a maximal frequent interval as an interval 
that is frequent which means that it is present in sufficient 
number of transactions and no other frequent interval contains it. 
Using a pre-fix traversal algorithm, the maximal frequent 
intervals have been found and it was also found experimentally 
that pre-order traversal algorithm outperforms the A-priori based 
algorithm. 

Our approach is different from the above approaches. We are 
taking into account the fact that the intervals of time are of fuzzy 
nature. By calculating density of the fuzzy intervals in a 
particular transactional dataset where transactions are associated 
with time intervals (non-fuzzy) as mentioned in the next section, 
we first compute the dense fuzzy time intervals by using some 
user defined minimum density value and then apply a join 
operation to join neighboring intervals to find maximal dense 
fuzzy intervals. The fuzzy intervals and their membership 
functions are provided by domain experts. 

Ill PROBLEM DEFINITION 

A. Some basic definitions related to fuzziness 

Let E be the universe of discourse. A fuzzy set A in E is 
characterized by a membership function A(x) lying in [0,1]. A (x) 
for x eE represents the grade of membership of x in A. Thus a 
fuzzy set A is defined as 

A={(x,A(x)),xe£} 

A Fuzzy set A is said to be normal if A(x) =1 for at least one x 
g E. 

An a-cut of a fuzzy set is an ordinary set of elements with 
membership grade greater than or equal to a threshold a, 0<a<l. 
Thus an a-cut A a of a fuzzy set A is characterized by 
A a ={x e£; A(x) > a} [see e.g. [3]] 

A fuzzy set is said to be convex if all its a-cuts are convex 
sets. 

A fuzzy number is a convex normalized fuzzy set A defined 
on the real line R such that 

1. there exists an x e R such that A(x ) =1, and 

2. A(x) is piecewise continuous. 

Thus a fuzzy number can be thought of as containing the real 

numbers within some interval to varying degrees. 

Fuzzy intervals are special fuzzy numbers satisfying the 

following. 

1. there exists an interval [a, b] c R such that A(x ) =1 for 
allx e [a, b], and 

2. A(x) is piecewise continuous. 

A fuzzy interval can be thought of as a fuzzy number with a flat 
region. A fuzzy interval A is denoted by A = [a, b, c, d] with a < 
b < c < d where A(a) = A(d) = and A(x) = 1 for all x e [b, c]. 



A(x) for all x e [a, b] is known as left reference function and A(x) 
for x g [c, d] is known as the right reference function. The left 
reference function is non-decreasing and the right reference 
function is non-increasing [see e.g. [4]]. The area of a fuzzy 
interval is defined as the area bounded by the membership 
function of the fuzzy interval and the real line. 

B. Contribution (vote) of a transaction to a fuzzy interval 

We define vote of a transaction t associated with the time 
interval [t 7 , t 77 ] for the fuzzy interval A= [a, b, c, d] as follows: 



i A(x)dx 
vote A = ^ 



A(x)dx 

a 

where A(x) is the membership function associated with the fuzzy 
interval. 

v" 

Here f A(x) dx is the portion of the area bounded by A(x) and 

rd 

the real line included in the time interval [t 7 , t 7 ]. A(x)dx is 

Ja 

the total area bounded by A(x) and the real line. 

Obviously VOte t A lies in [0,1] and if Ac[t 7 , t 77 ], then VOte t A = 

1 and if An[t 7 , t 77 ] =<£, then vote t A =0. 

C. Density of a fuzzy time interval in a data set 

The density of a fuzzy interval over a given temporal interval 
dataset D is computed by summing up the votes of all the 
transactions of D for the corresponding fuzzy time interval and 
dividing it by the total number of transactions in D. Each record 
contributes a vote, which falls in [0, 1]. 

density D A = ^vote t A/ 1 D I 

teD 

A fuzzy interval is dense if its density is more than a user 
specified threshold called min_density. 

D. Join of two fuzzy intervals 

The fuzzy intervals are given by the user as input. Two fuzzy 
intervals A and B are called neighbors or adjacent to each other 
if supp(A n B) ^<& where supp(A n B) ={x; (A n B)(x) > }[see 
£•£•[4]]. We assume that the input fuzzy intervals are such that if 
the intervals are arranged in the ascending order according to 
their starting time then each fuzzy interval has a unique left 
neighbor and a unique right neighbor. Let A = [ai, bi, Ci, dj and 
B = [sl 2 , b 2 , c 2 , d 2 ] be two adjacent fuzzy intervals. Without loss 
of generality we can assume that ai < a 2 . Also we assume that for 
any two adjacent fuzzy intervals such as A and B above Ci = a 2 
and di = b 2 and for Ci < x < di A(x) = 1 - B(x). Our assumption is 
natural since otherwise some points will be given more emphasis 
and some less emphasis. We define the join of A and B denoted 
by A A B is defined as 

A^^ta^b^c^d,] 
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Where (A A B)(x) = \ A(x), a x < x < bi 

A(x)+5(x)=l,bi<x<C2 
B(x) for c 2 < x < d 2 



B(x) = 



0, x < 4 and * > 9 
(jc-4)/2, 4<x<6 

1, 6<jc<7 
(9-jc)/2, 7< x < 9 



To explain the joining operation we again consider two fuzzy 
intervals [ai,bi,Ci,di] and [a2,b 2 ,C2,d 2 ] whose membership 
functions are shown in the figure 1. Here Ci = a 2 and b 2 = di. Any 
point in between Ciand di will have a membership value of A(x) 
corresponding to A and corresponding to B it will have a 
membership value of B(x) = 1 - A(x) so that A(x) + B(x) = 1. 
Thus our joined fuzzy interval will be [a b b b c 2 , d 2 ] (shown in 
fig.2). 




Fig 1: Join of two fuzzy intervals 

Bt \G 





Fig 2: Joined interval 

A dense fuzzy interval is maximal if no super set of it is dense. 
However a subset of it may not be dense because the downward 
and upward closure property for dense sets may not hold in this 

case. 

E. Theorem 

The join of two fuzzy intervals is not dense if both of the fuzzy 
intervals are not dense and dense if at least one of the fuzzy 
intervals is dense. 

Proof. To prove the above result we consider a data set D with 8 
transactions. The time-intervals associated with the transactions 
are shown below. 
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Table 1: Transaction datasets 

Consider the fuzzy intervals A = [1, 3, 4, 6] and B - [4, 6, 7, 9] 
where the membership functions of A and B are respectively 



A(x). 



0, x < 1 and x > 6 
(jk-1)/2, 1<jc<3 

1, 3<x<4 
(6-jc)/2, 4< x < 6 



vote, A = 



j A(x)dx 



vote, A = 



f A(x)dx 

r 6 

I A(x)± 



vote, A = 



r 6 

I A(x)dx 

f 6 



=1/3 



= 1 



vote, A 

l A 



r 6 

I A(x)d; 

f 2 A(x)d 



=2/3 



r 6 

I A(x)A 



• = 2.75/3 



vote, A = 



j A(x)^ 



I A(x)dx 



=25/3 



\ A(x)dx 

vote, A = 4 = 

A{x)dx 



vote, A = f 



f A(x)rfA 



vote, A = r 



pt> 

I A(x)^ 
f A{x)dx 
I A(x)dx 



=25/3 



.25/3 



Therefore, 



Density(A) = 



vote., A+vote t ^ A+vote to A+vote t . A+vote tc A+vote t£ A+vote tn A+vote to A 

[1 l _Z l _i [4 l _5 l _b l J_ '8 

8 

=3.1666666/8 
= 0.395833325 



Similarly 



[ B(x)dx 
vote,B = —„ =0 



vote, B = 



and 



f B(x)d: 
j 6 B(x)d. 
\ 9 B{x)dx 

J4 



= 1/3 
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\ 6 B(x)dx 
vote, B = J3 



B(x)dx 

J4 



= 1/3 



r 6 
B{x)dx 

vote, B = — Q = 1/3 

B(x)dx 

J4 

\ B(x)dx 

vote, B = — Q =1.75 

5 f 

I B(x)dx 



vote, B = 



vote, B = n 



( ' B{x)d> 

\ 9 B{x)d* 

J4 

j 2 B(x)dx 



-=1/3 



7 r9 

B(x)dx 
\ B(x)dx 



vote, B = - ( - 



j* B(x)dx 



= 1.75/3 



Therefore, 



Density(B) = — A 2 2 ^ — ^ ^ ^ L 

= 2.5/8 = 0.3125 
Now, (A A £)=[1,3, 7, 9] 



(A A B)(x)= 



0, jc < 1 and x > 9 

(jc-1)/2, 1<jc<3 

1, 3<x<7 

(9-jc)/2, 7< x < 9 



vote t (A A B) = 



vote t (A A B) = 



vote h (A A B) = 



vote t (A A B) = 



£(A A B)(x)dx 
j\A A B)(x)dx 
^(A A B)(x)dx 
j\A A B)(x)dx 
f(A A B)(x)dx 
j 9 (A A B)(x)dx 
£(A A B)(x)dx 
j 9 (A A B)(x)dx 



=1/6 



= 4/6 



= 3/6 



=2.75/6 



vote, (A A £) = 



vote, (A A £) = 



J 7 (A a 5)(jc)Jjc 
j 9 (A A B)(x)dx 
\\A A B)(x)dx 
f(A R B)(x)dx 



=2/6 



= 1/6 



f 2 (A A fi)(jc)dx 
vo/e /? (A A fi) = \ =.25/6 



vote, (A B) = 



J {A A B){x)dx 
j 1 (A* B)(x)dx 
j 9 (A A B)(x)dx 



= 2/6 



vote u A+vote t . A+vote t „ A+vote t . A+vote tc A+vote t . A+vote tn A+vote to A 



Therefore, 

Density(A A B) = 

Therefore Density(A A B) = 2.83333/8 

= 0.35416625 
So if we take min_dense = 0.35 then we see that A is dense but B 
is not dense whereas (A A B) is dense. This establishes that the 
downward as well as upward closure property is not satisfied for 
dense fuzzy intervals. 

IV. PROPOSED ALGORITH 

The algorithm is a level wise algorithm similar to the A-priori 
algorithm used for frequent item set mining [1]. Input to the 
algorithm is a temporal interval data set say D, n fuzzy intervals 
(called basic fuzzy intervals here) satisfying both the 
assumptions made in definition of join of fuzzy intervals defined 
on the time period covered by the dataset and with a value of 
min_density (minimum density value). The algorithm first finds 
the dense basic fuzzy intervals by going through the dataset once 
and using the definition C given in section ///. They are dense 
fuzzy intervals at level 1 we denote this set of dense intervals by 
Li. Next each dense fuzzy interval at level 1 is joined with its left 
neighbour and right neighbour both of which are basic intervals 
(may not be dense) using the join operation defined definition D 
in section ///. They are the candidates C 2 at level 2. Using the 
same technique, going through the data set once more the dense 
fuzzy intervals at level 2 say L 2 are obtained. These are kept and 
the others removed. If any of the intervals obtained by joining a 
dense interval say A with its neighbours turn out to be dense then 
A is removed from the list of dense intervals maintained at the 
previous level. This level wise extraction goes on till a particular 
level becomes empty. Then the intervals kept at each level are 
the maximal dense fuzzy intervals. It is mentioned here that at 
any level the dense intervals are joined with their neighbors from 
the basic fuzzy intervals only. This is done because two new 
fuzzy intervals obtained by joining basic intervals although 
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neighbors may not satisfy our second assumption (Definition D) 
for being conformable for the join operation. When two intervals 
A and B are joined where A is the left neighbor of B, then the left 
neighbor of A becomes the left neighbor of A A B and the right 
neighbor of B becomes the right neighbor of A A B. 

• Algorithm 1 
Input Q = { Aj ; i = 1, 2,. . .n} /* set of fuzzy intervals */ 
Set Density [i]=0; for i = 1,2, ...,n /* Density[i] stores the 
Density of A- x */ 
for each transaction t in D 

{ 

Compute vote t (Ai) for i = 1,2, . . . .n 
Density[i] += vote t (Ai) 

} 
for(i = 1, 2,....,n) do 



if( ( Density [i])/ 1 D \ > min_density ) 
Add A { to Li 



} 



k=l 

Li= [Dense fuzzy intervals at level 1] 

for (k = 2 ; L k * <|> ; k++) 

{ 
do 



C k = candidate-gen (L k _i) 

Compute L k by going through the transactions 

in the dataset 

k = k+l 



Candidate-gen(L k _!, C k ) 



for all Ae L k _! 

form A A L and A A R where L and R are the left 

and right neighbours of A respetively in case 

these exists. 

/* For the extreme intervals both the 

neighbours may not exist */ 
C k = C k u{A A L,A A R} 



} 



To illustrate the above algorithm we again consider the example 
given in the section-Ill. For the sake of convenience, consider the 
basic fuzzy interval as fuzzy number with triangular membership 
function, which will be the input intervals for the first level i.e. 
d = {A, B, C, D, E, F], where A = [1, 2, 3], B = [2, 3, 4], C = [3, 
4, 5], D = [4, 5, 6], E = [5, 6, 7] and F = [6, 7, 8] and min_density 
= 0.4. 

After the first pass we have, Density(A) = 0.375, Density(B) = 
0.375, Density(C) = 0.375, Density(D) = 0.5, Density(E) = 0.5, 
Density(F) = 0.1875. 



Thus the set of first level dense fuzzy number is 

Li= {A E) 
Candidates for the second pass are 

C 2 ={C A D,D A E,E A F} 
where each members of C 2 are formed by joining the members of 
L x with their left right neighbor of C\ using the definition of join 
and C A D = [3, 4, 5, 6], D A E = [4, 5, 6, 7]. E A F = [5, 6, 7, 8] 
After the second pass, we get Density(C A D) = 0.4375, 
Density(D A E) = 0.5, Density(E A F) = 0.34375. 
Thus the second level dense sets are 

L 2 = {C A D,D A E} 
Joining with their left and right neighbors from the basic fuzzy 
numbers we obtain the candidates for the third pass as 

C 3 = {B A C A D, C A D A E, D A E A F} 
After third pass, we get Density (B A C A D) = 0.458333333, 
Density(C A D A E) = 0.458333333, Density (D A E A F) = 
0.3958333333. 
Thus the third level dense sets are 

L 3 = {B A C A D, C A D A E) 
Similarly candidates for the fourth pass as 

C 4 = {A A B A C A D, B A C A D A E, C A D A E A F] 
After the fourth pass, we get Density(A A B A C A D) = 0.40625, 
Density(B A C A D A E) = 0.0.4375, Density(C A D A E A F) = 0.390625. 
Thus the fourth level dense sets are 

L 4 = {A A B A C A D, C A D A E A F] 
Candidates for the fifth pass as 

C 5 = {A A B A C A D A E, B A C A D A E A F) 
After the fifth pass, we get Density (A A B A C A D A E) = 0.425, 
Density(B A C A D A E A F) = 0.3875. 
Thus the fifth level frequent sets are 

L 5 = {A A B A C A D A E} 
Candidates for the sixth pass are 

C 6 = {A A B A C A D A E A F} 
After the sixth pass Density (A A B A C A D A E A F) = 0.385416666, 
which is less than min_ density. 

Thus the sixth level is empty which is empty. So the algorithm 
terminates giving the following maximal dense sets A A B A C A D A E. 



CONCLUSIONS 

In this paper, we have introduced the concept of fuzziness in 
mining maximal dense intervals. In our datasets each transaction 
has associated with it a time interval of the form [start_time, 
end_time]. It is a level-wise method of generating dense fuzzy 
intervals. At the bottom level we have the basic dense fuzzy 
intervals. In subsequent levels the already obtained dense fuzzy 
intervals are expanded by joining them with their neignbours 
from the basic fuzzy intervals and their density counted by going 
through the dataset to check whether they are frequent or not. 
The process continues till no candidate is generated or some 
level is empty. The algorithm finally gives only the maximal 
dense fuzzy intervals. This algorithm although looks like A- 
priori algorithm, has a slight variation in the sense that it has to 
take into account the fact that the downward and upward closure 
properties of dense interval do not hold here. 
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Abstract — Edge detection is the first step in image segmentation. 
Image Segmentation is the process of partitioning a digital image 
into multiple regions or sets of pixels. Edge detection is one of the 
most frequently used techniques in digital image processing. The 
goal of edge detection is to locate the pixels in the image that 
correspond to the edges of the objects seen in the image. Filtering, 
Enhancement and Detection are three steps of Edge detection. 
Images are often corrupted by random variations in intensity 
values, called noise. Some common types of noise are salt and 
pepper noise, impulse noise and Gaussian noise. However, there 
is a trade-off between edge strength and noise reduction. More 
filtering to reduce noise results in a loss of edge strength. In order 
to facilitate the detection of edges, it is essential to determine 
changes in intensity in the neighborhood of a point. Enhancement 
emphasizes pixels where there is a significant change in local 
intensity values and is usually performed by computing the 
gradient magnitude. Many points in an image have a nonzero 
value for the gradient, and not all of these points are edges for a 
particular application. Therefore, some method should be used to 
determine which points are edge points. Four most frequently 
used edge detection methods are used for comparison. These are: 
Roberts Edge Detection, Sobel Edge Detection, Prewitt Edge 
Detection and Canny Edge Detection. One the other method in 
edge detection is spatial filtering. This Paper represent a special 
mask for spatial filtering and compare throughput the standard 
edge detection algorithms (Sobel, Canny, Prewit & Roberts) with 
the spatial filtering. 

Keywords-Spatial Filtering, Median Filter, Edge Detection, Image 
Segmentation. 



I. 



INTRODUCTION 



Over the years, several methods have been proposed for the 
image edge detection which is the method of marking points in 
a digital image where luminous intensity changes sharply for 
which different type of methodology have been implemented 
in various applications like traffic speed estimation [5], Image 
compression [6], and classification of images [7]. Most of the 
traditional edge-detection algorithms in image processing 
typically convolute a filter operator and the input image, and 
then map overlapping input image regions to output signals 
which lead to considerable loss in edge detection [8,9]. 

Edge and feature points are basic low level primitives for 
image processing. Edge and feature detection are two of the 



most common operations in image analysis. An edge in an 
image is a contour across which the brightness of the image 
changes abruptly. In image processing, an edge is often 
interpreted as one class of singularities. In a function, 
Singularities can be characterized easily as discontinuities 
where the gradient approaches Infinity. However, image data 
is discrete, so edges in an image often are defined as the Local 
maxima of the gradient. This is the definition we will use here. 
Operations in image processing, This topic has attracted many 
researchers and many achievements have been made [11-18]. 

For Such as: Rooms et al proposed to estimate the out-of 
focus blur in wavelet domain by examining the sharpness of 
the sharpest edges [11]. Hanghang Tong et al proposed new 
blur detection schemes which can determine whether an image 
is blurred or not and to what extent an image is blurred. Which 
raises the demand for image quality assessment in terms of 
blur Based on the edge type and sharpness analysis using Harr 
wavelet transforms [12]. X. Marichal, proposed using DCT 
information to qualitatively characterize blur extent [13] 
Berthold K., ET AL describes the processing performed in the 
course of producing a line drawing from an image obtained 
through an image dissector camera. The edgemarking phase 
uses a non-linear parallel line-follower [14]. Lixia Xue et al 
proposed An edge detection algorithm for multispectral 
remote sensing image, they extended the onedimensional 
cloud-space mapping model to the multidimensional model 
[15]. Mike Heath etal, presented a paradigm based on 
xperimental psychology and statistics, in which humans rate 
the output of low level vision algorithms. They demonstrate 
the proposed experimental strategy by comparing four well- 
known edge detectors: Canny, Nalwa-Binford, Sarkar-Boyer, 
and Sobel [16], Hoover etal at USF have recently conducted 
such a comparison study based on manually constructed 
ground truth for range segmentation tasks [17]. Krishna Kant 
Chintalapudi et al showed that such localized edge detection 
techniques are non-trivial to design in an arbitrarily deployed 
sensor network. They defined the notion of an edge and 
develop performance metrics for evaluating localized edge 
detection algorithms [10,18]. 

Usage of specific linear time-invariant (LTI) filters is the 
most common procedure applied to the edge detection 
problem, and the one which results in the least computational 
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effort. In the case of first-order filters, an edge is interpreted as 
an abrupt variation in gray level between two neighbor pixels. 
The goal in this case is to determine in which points in the 
image the first derivative of the gray level as a function of 
position is of high magnitude. By applying the threshold to the 
new output image, edges in arbitrary directions are detected. 

In other ways the output of the edge detection filter is the 
input of the polygonal approximation technique to extract 
features which to be measured. A very important role is played 
in image analysis by what are termed feature points, pixels 
that are identified as having a special property. Feature points 
include edge pixels as determined by the well-known classic 
edge detectors of PreWitt, Sobel, Roberts, Canny and Spatial 
Filtering. Classical operators identify a pixel as a particular 
class of feature point by carrying out some series of operations 
within a window centered on the pixel under scrutiny. The 
classic operators work well in circumstances where the area of 
the image under study is of high contrast. In fact, classic 
operators work very well within regions of an image that can 
be simply converted into a binary image by simple 
thresholding!!]. 

This paper is organized as follows. Section II is for the 
purpose of providing some information about edge detection. 
Section III is focused on simulation results and also focused 
on comparison of various Edge Detection Methods. Section IV 
presents the conclusion. 



II. Edge Detection 

Edge detection techniques transform images to edge images 
benefiting from the changes of grey tones in the images. Edges 
are the sign of lack of continuity, and ending. As a result of 
this transformation, edge image is obtained without 
encountering any changes in physical qualities of the main 
image. Objects consist of numerous parts of different color 
levels. In an image with different grey levels, despite an 
obvious change in the grey levels of the object, the shape of 
the image can be distinguished in Fig.l. 



(IJCSIS) International Journal of Computer Science and Information Security, 

Vol 9, No. 2, February 2011 
An Edge in an image is a significant local change in the 
image intensity, usually associated with a discontinuity in 
either the image intensity or the first derivative of the image 
intensity. Discontinuities in the image intensity can be either 
Step edge, where the image intensity abruptly changes from 
one value on one side of the discontinuity to a different value 
on the opposite side, or Line Edges, where the image intensity 
abruptly changes value but then returns to the starting value 
within some short distance. However, Step and Line edges are 
rare in real images. Because of low frequency components or 
the smoothing introduced by most sensing devices, sharp 
discontinuities rarely exist in real signals. Step edges become 
Ramp Edges and Line Edges become Roof edges, where 
intensity changes are not instantaneous but occur over a finite 
distance. Illustrations of these edge shapes are shown in Fig.l. 



/ 



(a) 



(b) 



M 



id i 



Figure 1 . Type of Edges (a) Step Edge (b) Ramp Edge (c) Line Edge (d) 
Roof Edge 



A. Steps in Edge Detection 

Edge detection contain three steps namely Filtering, 
Enhancement and Detection. The overview of the steps in 
edge detection are as follows. 

1) Filtering: Images are often corrupted by random 
variations in intensity values, called noise. Some common 
types of noise are salt and pepper noise, impulse noise and 
Gaussian noise. Salt and pepper noise contains random 
occurrences of both black and white intensity values. 
However, there is a trade-off between edge strength and noise 
reduction. More filtering to reduce noise results in a loss of 
edge strength. 

2) Enhancement: In order to facilitate the detection of edges, 
it is essential to determine changes in intensity in the 
neighborhood of a point. Enhancement emphasizes pixels 
where there is a significant change in local intensity values 
and is usually performed by computing the gradient 
magnitude. 

3) Detection: Many points in an image have a nonzero value 
for the gradient, and not all of these points are edges for a 
particular application. Therefore, some method should be used 
to determine which points are edge points. Frequently, 
thresholding provides the criterion used for detection. 

B. Edge Detection Methods 

Three most frequently used edge detection methods are used 
for comparison. These are (1) Roberts Edge Detection, (2) 
Sobel Edge Detection, (3) Prewitt edge detection and (4) 
Canny edge detection. One the other method in edge detection 
is spatial filtering. The details of methods as follows: 

1) The Roberts Detection: The Roberts Cross operator 
performs a simple, quick to compute, 2-D spatial gradient 
measurement on an image. It thus highlights regions of high 
spatial frequency which often correspond to edges. In its most 
common usage, the input to the operator is a grayscale image, 
as is the output. Pixel values at each point in the output 
represent the estimated absolute magnitude of the spatial 
gradient of the input image at that point. Fig. 2. shows Roberts 
Mask. 
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Figure 2. Roberts Mask 

2) The Prewitt Detection: The prewitt edge detector is an 
appropriate way to estimate the magnitude and orientation of 
an edge. Although differential gradient edge detection needs a 
rather time consuming calculation to estimate the orientation 
from the magnitudes in the x and y-directions, the compass 
edge detection obtains the orientation directly from the kernel 
with the maximum response. The prewitt operator is limited to 
8 possible orientations, however experience shows that most 
direct orientation estimates are not much more accurate. This 
gradient based edge detector is estimated in the 3x3 
neighbourhood for eight directions. All the eight convolution 
masks are calculated. One convolution mask is then selected, 
namely that with the largest module. Fig. 3. shows Prewitt 
Mask. 
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Figure 3. Prewitt Mask 



3) The Sobel Detection: The Sobel operator performs a 2-D 
spatial gradient measurement on an image and so emphasizes 
regions of high spatial frequency that correspond to edges. 
Typically it is used to find the approximate absolute gradient 
magnitude at each point in an input grayscale image. In theory 
at least, the operator consists of a pair of 3x3 convolution 
kernels as shown in Figure 4. One kernel is simply the other 
rotated by 90o.This is very similar to the Roberts Cross 
operator. The convolution masks of the Sobel detector are 
given in Fig. 4. Fig. 5. shows Edge patterns for Sobel edge 
detector. 
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Figure 4. Sobel Mask 
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Figure 5. Edge patterns for Sobel edge detector 

4) The Canny Detection: Canny edge detection is an 
important step towards mathematically solving edge detection 
problems. This edge detection method is optimal for step 
edges corrupted by white noise. Edge detection with low 
probability of missing true edges, and a low probability of 
detecting false edges. [2] The Canny algorithm uses an 
optimal edge detector based on a set of criteria which include 
finding the most edges by minimizing the error rate, marking 
edges as closely as possible to the actual edges to maximize 
localization, and marking edges only once when a single edge 
exists for minimal response. [3] 

Canny used three criteria to design his edge detector. The 
first requirement is reliable detection of edges with low 
probability of missing true edges, and a low probability of 
detecting false edges. Second, the detected edges should be 
close to the true location of the edge. Lastly, there should be 
only one response to a single edge. To quantify these criteria, 
the following functions are defined: 



SNR(f) = —. 



SNR(f) = - 




(1) 



(2) 



n r 



J" f'\x)dx 

J — oo 



where A is the amplitude of the signal and n20 is the 
variance of noise. SNR(f) defines the signal-to-noise ratio and 
Loc(f) defines the localization of the filter f(x). 
The Canny edge detection algorithm runs in 5 separate steps: 

1 . Smoothing: Blurring of the image to remove noise. 

2. Finding gradients: The edges should be marked where the 
gradients of the image has large magnitudes. 

3. Non-maximum suppression: Only local maxima should 
be marked as edges. 

4. Double thresholding: Potential edges are determined by 
thresholding. 

5. Edge tracking by hysteresis: Final edges are determined 
by suppressing all edges that are not connected to a very 
certain (strong) edge. [19] 

5) The Spatial Filtering Detection: we implement image 
edge detection so that we can identify the boundary of object 
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in an image. For this, we apply a spatial mask. Fig.6. shows 
Spatial Mask. 

"-1 -2 -f 

-2 2 

1 2 1 

Figure 6. Spatial Mask 

The mechanics of spatial filtering are illustrated in the Fig.7. 
The process consists simply of moving the center of the filter 
mask oo from point to point in an image, f. at each point (x, y), 
the response of the filter at that point is the sum of the 
products of the filter coefficients and the corresponding 
neighborhood pixels in the area spanned by the filter mask. [4] 

r- Image Origin. 
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ones that have been found out by Any one of the standard edge 
detection algorithms (Sobel, Canny, Prewit & Roberts). On the 
other hand, by the "Spatial Filtering" more of the edges will be 
traced and the outputs of this algorithm provide much more 
distinct marked edges and thus have better visual appearance 
than the standard existing. 

Thus the "Spatial Filtering" Edge Detection algorithm 
provides better edge detection and helps to extract the edges 
with a very high efficiency and specifically establishes to 
avoid double edges results in obtaining an image with single 
edges. 

Original tnage Grayscale mage 




Figure 7. The Mechanics of Spatial Filtering. 



III. 



SIMULATION RESULTS 



The algorithm for image edge detection was tested for 
various images and the outputs were compared to the existing 
edge detection algorithms and it was observed that the outputs 
of this algorithm provide much more distinct marked edges 
and thus have better visual appearance than the ones that are 
being used. The sample output shown below in Fig. 8 
compares the "Sobel", "Roberts", "Prewitt" and "Canny" 
Edge detection algorithms together and with the "Spatial 
Filtering" algorithm in Fig. 9. It can be observed that the output 
that has been generated by the "Spatial Filtering" has found 
out the edges of the image more distinctly as compared to the 




Figure 8. Results of our algorithm compared with standard edge detection 
algorithms(Sobel, Canny, Prewit & Roberts) 
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Figure 9. Results of our algorithm compared with Spatial Filtering 



111 



http://sites.google.com/site/ijcsis/ 
ISSN 1947-5500 



(IJCSIS) International Journal of Computer Science and Information Security, 

Vol 9, No. 2, February 2011 



IV. Conclusion 



This paper proposed 2 methods for edge detection. In the 
first method the standard edge detection algorithms (Sobel, 
Canny, Prewitt & Roberts) has been used for edge detection 
and the second method is the special Spatial Filtering method 
is used for edge detection. It can be observed that the output 
that has been generated by the "Spatial Filtering" has found 
out the edges of the image more distinctly as compared to the 
ones that have been found out by Any one of the standard edge 
detection algorithms (Sobel, Canny, Prewit & Roberts). On the 
other hand, by the "Spatial Filtering" more of the edges will be 
traced and the outputs of this algorithm provide much more 
distinct marked edges and thus have better visual appearance 
than the standard existing. Thus the "Spatial Filtering" Edge 
Detection algorithm provides better edge detection and helps 
to extract the edges with a very high efficiency and 
specifically establishes to avoid double edges results in 
obtaining an image with single edges. 



References 



[i] 



Abdallah A. Alshennawy and Ayman A. Aly, "Edge Detection in Digital 
Images Using Fuzzy Logic Technique ", World Academy of Science, 
Engineering and Technology 51 2009 

N. Senthilkumaran and R. Rajesh, "Edge Detection Techniques for 
Image Segmentation - A Survey of Soft Computing Approaches", 
International Journal of Recent Trends in Engineering, Vol. 1, No. 2, 
May 2009. 

Hong Shan Neoh and Asher Hazanchuk, "Adaptive Edge Detection for 
Real-Time Video Processing using FPGAs". 

N. B. Bahadure, "Image Processing: Filteration, Gray Slicing, 
Enhancement, Quantization, Edge Detection and Blurring of Images in 
Matlab", International Journal of Electronic Engineering Research, 
ISSN 0975 - 6450 Volume 2 Number 2 (2010) pp. 145-151. 
Dailey D. J., Cathey F. W. and Pumrin S. 2000. An Algorithm to 
Estimate Mean Traffic Speed Using Uncalibrated Cameras. In 
proceedings of IEEE Transactions on intelligent transport systems, 
Vol.1. 

Desai U. Y., Mizuki M. M., Masaki I., and Berthold K.P. 1996. Edge 
and Mean Based Image Compression. Massachusetts institute of 
technology artificial intelligence laboratory .A.I. Memo No. 1584. 
Rafkind B., Lee M., Shih-Fu and Yu C. H. 2006. Exploring Text and 
Image Features to Classify Images in Bioscience Literature. In 
Proceedings of the BioNLP Workshop on Linking Natural Language 
Processing and Biology at HLTNAACL 06, pages 73-80, New York 
City. 

Roka A., Csapo A., Resko B., Baranyi P. 2007.Edge Detection Model 
Based on Involuntary Eye Movements of the Eye-Retina System. Acta 
Polytechnica Hungarica Vol. 4. 

Shashank Mathur and Anil Ahlawat, "Application of Fuzzy Logic on 
Image Edge Detection", Intelligent Technologies and Applications. 
[10] Leila Fallah Araghi and Mohammad Reza Arvan, "An Implementation 
Image Edge and Feature Detection Using Neural Network", 
Proceeding of the International MultiConference of Engineers and 
Computer Scientists 2009 Vol I IMECS 2009, March 18-20, 2009, 
Hong Kong. 

F. Rooms, and A. Pizurica, "Estimating image blur in the wavelet 
domain", ProRISC 2001, pp. 568-572. 



[2] 



[3] 
[4] 



[5] 



[6] 



[7] 



[8] 



[9] 



[12] Hanghang Tong, Mingjing Li, Hongjiang Zhang, Changshui Zhang, " 

Blur Detection for Digital Images Using Wavelet Transform" ICME04, 

2004. 
[13] X. Marichal, W.Y. Ma and H.J. Zhang, "Blur Determination in the 

Compressed Domain Using DCT Information,"Proceedings of the IEEE 

ICIP'99, pp.386-390. 
[14] Berthold K. P. Horn, "The 'Binford-Horn LINE-FINDER" 

MASSACHUSETTS INSTITUTE OF TECHNOLOGY ARTIFICIAL 

INTELLIGENCE LABORATORY 1971 
[15] Lixia Xuea Zuocheng Wang, "An Edge Detection Algorithm for Remote 

Sensing Image" The International Archives of the Photogrammetry, 

Remote Sensing and Spatial Information Sciences. Vol. XXXVII. Part 

B3b. Beijing 2008 
[16] Mike Heath, Sudeep Sarkar, Thomas Sanocki,z and Kevin Bowyer, 

"Comparison of Edge Detectors A Methodology and Initial Study" 

Computer Vision And Image Understanding Vol. 69, No. 1, January, pp. 

38-54,1998. 
[17] A. Hoover, G. Jean-Baptiste, X. Jiang, P. J. Flynn, H. Bunke, D. 

Goldgof,and K. Bowyer, "Range image segmentation: The user's 

dilemma", in Internationals ymposium on Computer Vision, 1995, pp. 

323-328. 
[18] K. Chintalapudi, R. Govindan, "Localized Edge Detection in Sensor 

Fields", Ad-hoc Networks Journal, 2003. 
[19] J. Canny, "A Computational Approach to Edge Detection", IEEE 

Transactions on Pattern Analysis and Machine Intelligence, Vol. 8, No. 

6, Nov. 1986. 



AUTHORS PROFILE 

Ehsan Azimi Rad, received the B.Sc. degree in 
computer engineering and M.Sc. degree in control 
engineering with honors from the Ferdowsi University 
of Mashhad, Mashhad , Iran , in 2006 and 2009, 
respectively.He is now PHD student in electrical and 
electronic engineering at Tarbiat Moallem University of 
Sabzevar in Iran. His research interests are fuzzy 
control systems and its applications in urban traffic and 
any other problems, nonlinear control, Image 
Processing and Pattern Recognition and etc. 

Javad Haddadnia, received his B.S. and M.S. degrees 
in electrical and electronic engineering with the first 
rank from Amirkabir University of Technology, 
Tehran, Iran, in 1993 and 1995, respectively. He 
received his Ph.D. degree in electrical engineering from 
Amirkabir University of Technology, Tehran, Iran in 
2002. He joined Tarbiat Moallem University of 
Sabzevar in Iran. His research interests include neural 
network, digital image processing, computer vision, and 
face detection and recognition. He has published 
several papers in these areas. He has served as a 
Visiting Research Scholar at the University of Windsor, 
Canada during 2001- 2002. He is a member of SPIE, 
CIPPR, and IEICE. 





[11 



112 



http://sites.google.com/site/ijcsis/ 
ISSN 1947-5500 



(IJCSIS) International Journal of Computer Science and Information Security, 

Vol 9, No. 2, 2011 



Improving Cathodic Protection System using 
SMS -based Notification 



Mohd Hilmi Hasan 

Computer and Information Sciences Department 

Universiti Teknologi PETRONAS 

Bandar Seri Iskandar, Tronoh, Malaysia 

mhilmi_hasan @ petronas .com. my 



Nur Hanis Abdul Hamid 

Computer and Information Sciences Department 

Universiti Teknologi PETRONAS 

Bandar Seri Iskandar, Tronoh, Malaysia 



Abstract — Mobile service has produced significant impact in 
various industries. It has also gained growing demands for not 
only in telecommunication sector, but also numerous other 
sectors such as banking, business, entertainment, education and 
many others. The objective of this paper is to present yet another 
mobile system development to enhance current cathodic 
protection (CP) system. The developed system is able to send 
notification to technicians via SMS if there is any fault occurs in 
gas pipeline. The system has been developed in three-tier 
architecture and tested with functional testing. It is connected 
with CP system which functions to monitor CP measurements 
upon gas pipeline. If there is any fault detected by CP system, it 
will send instruction to the developed system, which will then 
invoke SMS notification delivery to technicians. The system has 
successfully been developed and believed can improve current CP 
system that requires human to manually perform the monitoring 
process. This study implies effectiveness and time saving as 
responsible personnel or technicians will be notified of any faults 
anytime and anywhere through mobile phones. For future work, 
it is recommended that the system will also be equipped with 
proactive notification delivery in which technicians will be 
notified if any faults are expected to occur. 

Keywords-SMS;notification system; SMS-based system; 
cathodic protection 



I. 



Introduction 



The explosion in development of mobile applications and 
services has given a significant impact to the mobile phone 
industry. This industry has gained growing demands in 
numerous sectors such as business [1], banking [2] and gaming 
[3]. It is reported that in May 2010 alone, there were 92 
countries generated over ten million mobile advertisement 
requests [4]. Benefits gained from mobile services are not only 
meant for customers but for service providers too. It provides a 
broad range of business opportunities to service providers with 
potential streams of revenue. It is forecasted that mobile 
services such as m-commerce will gain more significant 
growth globally in future [5]. The main factor of this great 
acceptance towards mobile service is believed to be its anytime 
and anywhere accessibility. Besides, another factor that plays a 
big role is its flexibility to meet users' expectations. For 
instance, advertisement has long been regarded negatively as 
garbage by customers. However, with new advancements in 
mobile service, advertisers may now provide more diversified 



and personalized advertisements to customers. This 
personalized m-advertisement is effective in a way that it 
allows appropriate message to reach the most potential 
customers at the best time in the right place [6]. 

This paper focused on yet another mobile service 
development. It enhanced cathodic protection (CP) system 
through SMS notification feature. CP system is elementary to 
pipeline integrity management, and broadly used in gas, 
petrochemical and water transmission and distribution. 
Cathodic protection is implemented to protect pipelines, in 
which measurements of CP data are required to be reported 
regularly for monitoring purposes. Two important 
measurements are level of protection applied to the pipeline at 
the source and along the pipeline itself [7]. In this study, a 
system was developed to notify technicians of any faults occur 
regarding CP measurement upon pipelines. The notification is 
sent to technicians via SMS. The implementation of SMS in 
this system was believed to be very important mainly because it 
required less human intervention in monitoring processes. The 
developed system had exploited the significant advantages 
offered by mobile solutions. As known, mobile solution has 
become a popular choice to provide improvements in 
customer-oriented systems. The work done in [8] shows that 
mobile solution improves tourism industry. The system enables 
users to receive new tourist contents with minimal user 
intervention. Besides, the work done in [9] presents that the 
notification system has changed from conventional notice 
board to SMS. Their work focused on implementing SMS- 
based notification in e-parcel management system. Moreover, 
SMS-based notification is also implemented in asset 
management system [10]. In this system, the assets' locations 
are tracked using RFID and GIS technology. It also contains a 
feature that gives automated notification of asset movement 
and malfunction alarm via SMS to users. Furthermore, the 
work done in [11] shows the development of a mobile 
notification system in university. The system sends notification 
to students through mobile instant messaging application 
installed on their mobile phones [12]. This system implies 
benefits as students do not need to log on to e-learning system 
to retrieve announcements made by their lecturers. These all 
systems show that mobile solution has provided significant 
benefits to users specifically in providing real-time notification. 
Real-time notification is believed to become an efficient way of 
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diminishing the work process cycles and increase in 
information flow [13]. 

In a nutshell, the objective of this paper was to present the 
improvement of CP system through the implementation of 
SMS -based alarm notification. The system will notify 
technicians or responsible personnel of any faults that occur via 
SMS. The developed system was named as SMS -based 
Cathodic Protection (SMS-CP) system. 
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interoperability with the current PHP-based CP system. Apart 
from that, Joomla! was used as to develop CP system manager. 
Moreover, Ozeki NG SMS gateway software was used in this 
system to manage and perform the SMS sending functionality. 



II. 



METHODOLOGY 



This study began with literature study and data gathering 
works. Results produced from this initial works were then used 
in analysis process to produce system requirements. The study 
then continued with system design activities in which system 
architecture, system flow, use case diagram and database were 
designed. These designs were then used in the implementation 
process in which the system was developed and tested 
iteratively until it evolved as final product. In every iteration, a 
prototype was produced to be evaluated based on system 
requirements. Lastly, the final version of the developed system 
was tested with functional testing. The testing outcomes 
showed that the objective of this study had been successfully 
achieved. 

A. Development Tools 

A Microsoft Windows XP personal computer was used in 
this study for system development. It was also then used as a 
server to be installed with the developed system and the SMS 
gateway software. Besides, a Global System for Mobile 
Communications (GSM) modem was also used in this study to 
support the SMS sending functionality. 

PHP and MySQL were used as the development language 
and database respectively. They were chosen as to ensure 



B. System Architecture 

Fig. 1 shows the system architecture of the developed 
system. The system was developed in three-tier architecture. 
The data of CP value measurement is retrieved from measuring 
apparatus installed in gas pipeline. The data are sent to CP 
system manager system for further processing and to be stored 
in database. This study was conducted based on the real case 
study of a gas company in Malaysia. However, due to 
confidentiality issue and restriction in system authorization 
imposed by the company, the actual CP system manager could 
not be used in this study. Instead, a prototype system named as 
MANTAU was developed and used. MANTAU is a web-based 
system developed using PHP scripting language. 

The developed system, SMS-CP is installed on server. It 
contains a PHP script module that performs continuous 
checking procedure to check for CP measurement data from CP 
system manager. If there are any fault data found, the SMS-CP 
system will produce an instruction message to invoke Ozeki 
NG SMS gateway software for sending SMS. The details of the 
fault data which are the area (location) with its reference 
number, date, time and CP measurement will be sent to Ozeki 
NG SMS gateway software. Besides, phone numbers of 
technicians will also be forwarded by SMS-CP system to the 
Ozeki NG SMS gateway software. This software will then 
create an SMS message to be sent to technicians. There is also 
a database installed on server for SMS-CP system to store 
details about fault occurrence, and phone numbers of 
technicians. 



SMS-CP 
System 




Database 



Trigger for SMS 



Ozeki NG 

SMS 
Gateway 



Faults 
CP System 




CP Sy&tam Manager 



Server 



GSM Modem 



Broadcast 
SMS Message 



8 



Authorized Technicians 



Figure 1 . System Architecture. 
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Ozeki NG SMS gateway will forward the created SMS 
message to GSM modem. The GSM modem will then complete 
the notification sending process by forwarding the message to 
all authorized technicians via SMS. Fig.2 and Fig. 3 below 
show the use case diagram and sequence diagram of the 
developed system respectively. 
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III. Results and Discussion 
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A. System Prototype 

The CP system manager was developed as a web-based 
system. This system was named as MANTAU and its 
functionalities among others were to receive, process and store 
CP measurement data. Fig. 4 shows the interface of MANTAU 
system that displays a graft of data for 2007. 



A 



Figure 2. Use Case diagram. 




Figure 3. Sequence diagram. 
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Figure 4. Interface of MANTAU system (CP system manager). 

The data retrieved from CP measuring apparatus contained 
five values which were pipeline location, location code, date, 
time, and Transformer Rectifier (TR). These data are 
represented as follow: 

{location, code, date {day, month, year}, time {hour, minute, 
second}, TR } 

These data were stored in MANTAU database for further 
processes as well as for future reference. 

The SMS-CP system which was located on server 
contained a PHP script module to perform continuous check on 
fault CP measurement data from MANTAU database. In this 
study, the time gap was set to 30 seconds, which means SMS- 
CP system will check for CP measurement data for every half a 
minute. If there was a fault occurred, the data will be retrieved 
by SMS-CP system and stored in its database. At the same 
time, it will trigger another PHP script module to instruct Ozeki 
NG SMS gateway software to send SMS notification message 
to authorized technicians. In this case, SMS-CP system will 
forward the whole fault data along with technicians' phone 
numbers to Ozeki NG SMS gateway software. These data are 
represented as follow: 

{location, code, date {day, month, year}, time {hour, minute, 
second}, TR, phone} 

Fig. 5 shows the notification message received by technician's 
mobile phone via SMS. In this example, the data received are 
as follow: 
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{CP1 3 Ulu Pauh, 0008005, {23, 9, 2008}, {2, 43, 25}, TR: 
5.90V} 




Figure 5. Notification message via SMS. 



B. System Testing 

The developed system was tested using functional testing 
method. A set of test cases was created based on the system 
requirements. Table 1 show the test cases used in this testing 
process. 
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message, it was in between 30 seconds to 1.5 minutes. This 
duration was considered as acceptable. 

IV. Conclusion 

The developed system enables technicians in gas company 
to receive notification of any faults occurred in pipeline via 
SMS. The received notification contains important information 
namely location, date, time and the measurement value. The 
system implies benefit in terms of effectiveness and time 
saving, as technicians will be notified anytime and anywhere 
through mobile phone. 

The system consists of CP system manager, SMS-CP 
system and Ozeki NG SMS gateway software. The CP system 
manager functions as measurement data retriever and 
processer. These data are then stored in its database. Besides, 
SMS-CP system contains checking module which continuously 
performs the task to check for fault data from CP system 
manager. If there is a fault occurred, this system will trigger an 
instruction to ask Ozeki NG SMS gateway software to create 
SMS message. This gateway software will insert all data 
received from SMS-CP system and forward them through 
GSM modem to technicians. 



For future works, it is recommended that the system will 
also contain a functionality that can give notification 
proactively. That means a notification message will be sent to 
technicians when fault is expected to occur. 



TABLE I. 



Test cases for functional testing 



Test Case 


Expected Outcome 


1. The data set contains NO fault 
data. 


The reciever should not get SMS 
message. 


2. The data set contains ONE 
fault data. 


The reciever should get ONE SMS 
message. 


3. The data set contains ONE 
fault data. 


The correct data should be displayed 
in SMS message. 


4. The data set contains ONE 
fault data. 


The SMS message should be received 
within acceptable time duration. 


5. The data set contains MORE 
THAN ONE fault data. 


The reciever should get the right 
number of SMS messages. 


6. The data set contains MORE 
THAN ONE fault data. 


All received SMS messages should 
contain correct data. 


7. The data set contains MORE 
THAN ONE fault data. 


The SMS message should be received 
within acceptable time duration. 



Since the developed system was not linked to the real CP 
measurement apparatus, three data sets were created to become 
input for the CP system manager. The three data sets were: 1) 
without fault data; 2) contains one fault data; and 3) contains 
more than one fault data. Each data set contains 30 lines of 
data, in which each line contains data as follow: 

{location, code, date {day, month, year}, time {hour, minute, 
second}, TR} 

It is also important to note that fault data means TR value (in 
Volts) contains value 10.00 or below. 

In the functionality test that had been performed, all test 
cases in Table 1 had produced positive (success) outcomes. 
Regarding the time taken for receiver to receive notification 
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Abstract — Nowadays people are interested in using digital 
images. So the size of the image database is increasing 
enormously. Lot of interest is paid to find images in the database. 
There is a great need for developing an efficient technique for 
finding the images. In order to find an image, image has to be 
represented with certain features. Color and texture are two 
important visual features of an image. In this paper we propose an 
efficient image retrieval technique which uses dominant color and 
texture features of an image. An image is uniformly divided into 8 
coarse partitions as a first step. After the above coarse partition, 
the centroid of each partition ("color Bin" in MPEG-7) is selected 
as its dominant color. Texture of an image is obtained by using 
Gray Level Co-occurrence Matrix (GLCM). Color and texture 
features are normalized. Weighted Euclidean distance of color 
and texture features is used in retrieving the similar images. The 
efficiency of the method is demonstrated with the results. 

Keywords- Image retrieval, dominant color, Gray level co- 
occurrence matrix. 

I. INTRODUCTION 

Content-based image retrieval (CBIR) [1] has become a 
prominent research topic because of the proliferation of video 
and image data in digital form. Increased bandwidth 
availability to access the internet in the near future will allow 
the users to search for and browse through video and image 
databases located at remote sites. Therefore fast retrieval of 
images from large databases is an important problem that needs 
to be addressed. 

Image retrieval systems attempt to search through a 
database to find images that are perceptually similar to a query 
image. CBIR is an important alternative and complement to 
traditional text-based image searching and can greatly enhance 
the accuracy of the information being returned. It aims to 
develop an efficient visual-Content-based technique to search, 
browse and retrieve relevant images from large-scale digital 
image collections. Most proposed CBIR [2,3,4] techniques 
automatically extract low-level features (e.g. color, texture, 
shapes and layout of objects) to measure the similarities 
among images by comparing the feature differences. 

Color is one of the most widely used low-level visual 
features and is invariant to image size and orientation [1]. As 
conventional color features used in CBIR, there are color 



histogram, color correlogram, and dominant color descriptor 
(DCD). 

Color histogram is the most commonly used color 
representation, but it does not include any spatial information. 
Color correlogram describes the probability of finding color 
pairs at a fixed pixel distance and provides spatial information. 
Therefore color correlogram yields better retrieval accuracy in 
comparison to color histogram. Color auto correlogram is a 
subset of color correlogram, which captures the spatial 
correlation between identical colors only. Since it provides 
significant computational benefits over color correlogram, it is 
more suitable for image retrieval. DCD is MPEG-7 color 
descriptors [4]. DCD describes the salient color distributions 
in an image or a region of interest, and provides an effective, 
compact, and intuitive representation of colors presented in an 
image. However, DCD similarity matching does not fit human 
perception very well, and it will cause incorrect ranks for 
images with similar color distribution [5, 6]. In [7], Yang et al. 
presented a color quantization method for dominant color 
extraction, called the linear block algorithm (LB A), and it has 
been shown that LBA is efficient in color quantization and 
computation. For the purpose of effectively retrieving more 
similar images from the digital image databases (DBs), Lu et 
al. [8] uses the color distributions, the mean value and the 
standard deviation, to represent the global characteristics of 
the image, and the image bitmap is used to represent the local 
characteristics of the image for increasing the accuracy of the 
retrieval system. 

In [3,12] HSV color and GLCM texture are used as feature 
descriptors of an image. Here HSV color space is quantized 
with non-equal intervals. H is quantized into 8-bins, S into 3- 
bins and v into 3 -bins. So color is represented with one 
dimensional vector of size 72 (8X3X3). Instead of using 72 
color feature values to represent color of an image, it is better 
to use compact representation of the feature vector. For 
simplicity and with out loss of generality the RGB color space 
is used in this paper. 

Texture is also an important visual feature that refers to 
innate surface properties of an object and their relationship to 
the surrounding environment. Many objects in an image can be 
distinguished solely by their textures without any other 
information. There is no universal definition of texture. Texture 
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may consist of some basic primitives, and may also describe 
the structural arrangement of a region and the relationship of 
the surrounding regions [5]. In our approach we have used the 
texture features using gray-level co-occurrence matrix 
(GLCM). 

Our proposed CBIR system is based on Dominant color 
[21] and GLCM [17] texture. But there is a focus on global 
features. Because Low level visual features of the images such 
as color and texture are especially useful to represent and to 
compare images automatically. In the concrete selection of 
color and texture description, we use dominant colors, Gray- 
level co-occurrence matrix. The rest of the paper is organized 
as follows. The section II outlines proposed method in terms 
of Algorithm. The section III deals with experimental setup. 
The section IV presents results. The section V presents 
conclusions. 



II. 



PROPOSED METHOD 



Only simple features of image information can not get 
comprehensive description of image content. We consider the 
color and texture features combining not only be able to 
express more image information, but also to describe image 
from the different aspects for more detailed information in 
order to obtain better search results. The proposed method 
is based on dominant color and texture features of image. 
Retrieval algorithm is as follows: 

Stepl: Uniformly divide each image in the database and the 
target image into 8 -coarse partitions as shown in Fig.l. 
Step2: For each partition, the centroid of each partition is 
selected as its dominant color. 

Step3: Obtain texture features (Energy, Contrast, Entropy and 
inverse difference) from GLCM. 

Step4: construct a combined feature vector for color and 
texture. 

Step5: find the distances between feature vector of query 
image and the feature vectors of target images using weighted 
and normalized Euclidean distance. 
Step6: sort the Euclidean distances. 

Step7: retrieve first 20 most similar images with minimum 
distance 

A. Color feature representation 

In general, color is one of the most dominant and 
distinguishable low-level visual features in describing image. 
Many CBIR systems employ color to retrieve images, such as 
QBIC system and Visual SEEK. In theory, it will lead to 
minimum error by extracting color feature for retrieval using 
real color image directly, but the problem is that the 
computation cost and storage required will expand rapidly. So 
it goes against practical application. In fact, for a given color 
image, the number of actual colors only occupies a small 
proportion of the total number of colors in the whole color 
space, and further observation shows that some dominant 
colors cover a majority of pixels. Consequently, it won't 
influence the understanding of image content though reducing 



the quality of image if we use these dominant colors to 
represent image. 

In the MPEG-7 Final Committee Draft, several color 
descriptors have been approved including number of 
histogram descriptors and a dominant color descriptor (DCD) 
[4, 6]. DCD contains two main components: representative 
colors and the percentage of each color. DCD can provide an 
effective, compact, and intuitive salient color representation, 
and describe the color distribution in an image or a region of 
interesting. But, for the DCD in MPEG-7, the representative 
colors depend on the color distribution, and the greater part of 
representative colors will be located in the higher color 
distribution range with smaller color distance. It is may be not 
consistent with human perception because human eyes cannot 
exactly distinguish the colors with close distance. Moreover, 
DCD similarity matching does not fit human perception very 
well, and it will cause incorrect ranks for images with similar 
color distribution. We will adopt a new and efficient dominant 
color extraction scheme to address the above problems [7,8]. 

According to numerous experiments, the selection of 
color space is not a critical issue for DCD extraction. 
Therefore, for simplicity and without loss of generality, the 
RGB color space is used. Firstly the image is uniformly 
divided into 8 coarse partitions, as shown in Fig. 2. If there are 
several colors located on the same partitioned block, they are 
assumed to be similar. After the above coarse partition, the 
centroid of each partition is selected as its quantized color. Let 
X=(XR, XG,XB) represent color components of a pixel with 
color components Red, Green, and Blue, and Ci be the 
quantized color for partition i. 




Fig. 1 The coarse division of RGB color space. 

B. Extraction of dominant color of an image 

The procedure to extract dominant color of an image is as 
follows: 

According to numerous experiments, the selection of color 
space is not a critical issue for DCD extraction. Therefore, for 
simplicity and without loss of generality, the RGB color space 
is used. Firstly, the RGB color space is uniformly divided into 
8 coarse partitions, as shown in Fig. 2. If there are several 
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colors located on the same partitioned block, they are assumed 
to be similar. After the above coarse partition, the centroid 
of each partition ("color Bin" in MPEG-7) is selected as its 
quantized color. 

Let X=(XR, XG,XB) represent color components of a pixel 
with color components Red, Green, and Blue, and Ci be the 
quantized color for partition i. The average value of color 
distribution for each partition center can be calculated by 

Ki -JTT 

After the average values are obtained, each quantized color 
can be determined by using 

In this way, the dominant colors of an image will be obtained. 

C. Extraction of texture of an image 

Most natural surfaces exhibit texture, which is an 
important low level visual feature. Texture recognition will 
therefore be a natural part of many computer vision systems. 
In this paper, we propose a texture representation for image 
retrieval based on GLCM. 

GLCM [11, 13] is created in four directions with the 
distance between pixels as one. Texture features are extracted 
from the statistics of this matrix. Four GLCM texture features 
are commonly used which are given below: 

GLCM is composed of the probability value, it is defined 
by P(i,j\d, 6) which expresses the probability of the couple 
pixels at 8 direction and d interval. When 8 and d is 

determined, P(U j\d,0) is showed by Pi, j. Distinctly GLCM 
is a symmetry matrix and its level is determined by the image 
gray-level. Elements in the matrix are computed by the 
equation shown below: 



P(ij\d,0)-- 



P(ij\d,0) 



ZT/(ij\d,0) 



(4) 



GLCM expresses the texture feature according the 
correlation of the couple pixels gray- level value at different 
positions. It quantificationally describes the texture feature. In 
this paper, four texture features are considered. They include 
energy, contrast, entropy, inverse difference. 



Energy E=YZ P M 



(5) 



It is a texture measure of gray-scale image represents 
homogeneity changing, reflecting the distribution of image 
gray-scale uniformity of weight and texture. 



Contrast I = VMi - yf p(x, y) 



Contrast is the main diagonal near the moment of inertia, 
which measures how the values of the matrix are distributed 
and number of images of local changes reflecting the image 
clarity and texture of shadow depth. Large Contrast represents 
deeper texture. 



Entropy S=~ y y P (x,y)logP(x,y) 



0) 



Entropy measures randomness in the image texture. Entropy is 
minimum when the co-occurrence matrix for all values is 
equal. On the other hand, if the value of co-occurrence matrix 
is very uneven, its value is greater. Therefore, the maximum 
entropy implied by the image gray distribution is random. 



Inverse difference 



-55 



1 



Mx-yf 



-Aw) 



(6) 



It measures number of local changes in image texture. Its 
value in large is illustrated that image texture between the 
different regions of the lack of change and partial very evenly. 
Here p(x, y) is the gray- level value at the Coordinate (x, y). 

The texture features are computed for an image when d=l 
and (9=0°, 45°, 90°, 135° . In each direction four texture features 

are calculated. They are used as texture feature descriptor. 
Combined feature vector of Color and texture is formulated. 

III. EXPERIMENTAL SETUP 



A. Data set 

Wang's [15] dataset comprising of 1000 Corel images 
with ground truth. The image set comprises 100 images in each 
of 10 categories. The images are of the size 256 x 384 or 
384X256. But the images with 384X256 are resized to 
256X384. 

B. Feature set 

The feature set comprises color and texture descriptors 
computed for an image as we discussed in section 2. 

C. Computation of similarity 

The similarity between query and target image is 
measured from two types of characteristic features which 
includes dominant color and texture features. Two types of 
characteristics of images represent different aspects of 
property. So during the Euclidean similarity measure, when 
necessary the appropriate weights to combine them are also 
considered. Therefore, in carrying out Euclidean similarity 
measure we should consider necessary appropriate weights to 
combine them. We construct the Euclidean calculation model 
as follows: 

D(A, B) =G)iD(F CA , F CB ) + co 2 D(F T a , F TB ) (13) 
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Here coi is the weight of color features, <d 2 is the weight of 
texture features, F C a and F C b represents the normalized 72- 
dimensional color features for image A and B. For a method 
based on GLCM, F T a and F T b on behalf of 4- dimensional 
normalized texture features correspond to image A and B. 
Here, we combine color features and texture features. The 
value of co through experiments shows that at the time 
coi=co2=0.5 has better retrieval performance. 



IV. 



EXPERIMENTAL RESULTS 



The experiments were carried out as explained in sections II 
and III. The results are benchmarked with some of the existing 
systems using the same database [15]. The quantitative 
measure is given below 



p(o=— y 

100 ,. £-* 



l<j<1000,r(i,j)<100,ID(j)=ID(i) 

Where p(i) is precision of query image I, ID(i) and ID(j) 
are category ID of image I and j respectively, which are in the 
range of 1 to 10. The r(i, j) is the rank of image j. This value is 
percentile of images belonging to the category of image i, in 
the first 100 retrieved images. 

The average precision p t for category t(l<t<10) is given by 
1 v^ 



100 



l<i<l000,ID(i)=t 



The comparison of proposed method with other retrieval 
systems is presented in the Table 1 . These retrieval systems are 
based on HSV color, GLCM texture and combined HSV color 
and GLCM texture. Our sub-blocks based retrieval system is 
better than these systems in all categories of the database. 

The experiments were carried out on a Core i3, 2.4 GHz 
processor with 4GB RAM using MATLAB. Fig. 2 shows the 
image retrieval results using HSV color, GLCM texture, HSV 
color and GLCM texture and the proposed method. The image 
at the top left- hand corner is the query image and the other 19 
images are the retrieval results. 

The performance of a retrieval system can be measured in 
terms of its recall (or sensitivity) and precision (or 
specificity).Recall measures the ability of the system to 
retrieve all models that are relevant, while precision measures 
the ability of the system to retrieve only models that are 
relevant. They are defined as 



Recall = 



Number of relevant images retrieved 
Total Numberof relevantimages 



precisions 



Numberof relevantimagesretrieved 
Total Numberof images retrieved 
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& >- 
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Fig. 3 The image retrieval results( dinosaurs) using different techinques (a) 
retrieval based on HSV color (b) retrieval based on GLCM texture (c) retrieval 
based on HSV color and GLCM texture (d) retrieval based on proposed 
method 
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Tablel. Comparison of average precision obtained by 
proposed method with other retrieval techniques. 



Class 


Average Precision 


HSV color 


GLCM 
Texture 


HSV color 
+GLCM 
Texture 


Dominant 

color 
+GLCM 
Texture 

(proposed 
method) 


Africa 


0.26 


0.21 


0.25 


0.27 


Beaches 


0.27 


0.35 


0.21 


0.36 


Building 


0.38 


0.5 


0.24 


0.25 


Bus 


0.45 


0.22 


0.51 


0.52 


Dinosaur 


0.26 


0.29 


0.6 


0.91 


Elephant 


0.3 


0.24 


0.26 


0.38 


Rower 


0.65 


0.73 


0.81 


0.89 


Horses 


0.19 


0.25 


0.28 


0.47 


Mountain 


0.15 


0.18 


0.2 


0.3 


Food 


0.24 


0.29 


0.25 


0.32 


Average 


0.315 


0.326 


0.361 


0.467 



The following graph showing the Comparison of average 
precision obtained by proposed method with other retrieval 
systems. 




Dominant 
color+GLCM 
texture 
HSV 

color+GLCM 
texture 
-GLCM texture 



12345 678910 
class number 



Fig. 3 Average precision of various image retrieval methods for 10 classes of 
Corel database. 

The graph in Fig.4 showing the Comparison of average 
precision obtained by proposed method with other retrieval 
systems. And the graph in Fig. 5 showing the Comparison of 
average recall obtained by proposed method with other 
retrieval systems. 




Dominant 
color+GLCM 
texture 
HSV 

color+GLCM 
texture 
-GLCM texture 



Number of returned images 



Fig. 4 Average Precision of various image retrieval methods. 
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Dominant 
color+GLCM 
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color+GLCM 
texture 
-GLCM texture 



Fig. 5 Average recall of various image retrieval methods. 

V. CONCLUSION 

CBIR is an active research topic in image processing, 
pattern recognition, and computer vision. In this paper, a 
CBIR method has been proposed which uses the combination 
of dynamic dominant color, GLCM texture descriptor. 
Experimental results showed that the proposed method yielded 
higher average precision and average recall with reduced 
feature vector dimension. In addition, the proposed method 
almost always showed performance gain of average retrieval 
time over the other methods. As further studies, the proposed 
retrieval method is to be evaluated for more various databases. 
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Abstract: Predicting software defects in modules not only 
helps in maintaining legacy systems but also helps the 
software development process and ensures higher 
reliability. Advantage includes planning of resources for 
the projects and minimization of budget. Research has been 
carried out using statistical methodology and machine 
learning techniques which are generic in nature. The 
dependability on legacy Software systems to meet current 
demanding requirements is a major challenge for any IT 
administrator and estimation of costs to maintain the same 
is a huge challenge. In this paper, it is proposed to modify 
the existing multi layer perceptron Neural Network which 
is a popular supervised classification algorithm to predict 
defects in a given module based on the available software 
metrics. 

Keywords — Legacy software. Software metrics, Software 
reliability, Classification, Multilayer Perceptron Neural 
network, Fault-pi oneness. 



I. INTRODUCTION 



quality of software but does not ensure zero defects 
and is a very expensive proposition if not planned 
properly. 



Software quality modeling becomes an important 
criterion to ensure that the software not only meets 
the desired quality but also within time and budget 
lines. Defect prediction based on quantifiable metrics 
though in controversy, has been used successfully to 
predict defects in modules. Defect prediction models 
have independent variables captured in the form of 
product and process metrics and one dependent 
variable which indicates whether there could be a 
fault or no fault in the module. Typically researchers 
have used product metrics extensively to predict fault 
in the modules. The independent variables used for 
prediction of defects can be parameters captured in 
previous projects which is available in the 
configuration management system or can be 
computed from the current project. 



Software reliability and Software quality assurance 
are two major areas in software engineering which 
ensures high quality software. Both these concepts 
are drawn in throughout the development and 
maintenance process. The notable major activities 
used are performance analysis, functional tests, 
quantifying time and budget along with measurement 
of metrics[l]. In addition; code reviews, key 
personnel assignment and automatic test-case 
generation are the other strategies that are applied to 
reach the high reliability [2]. 



Software quality can be viewed from different 
perspectives including time, budget and mean time to 
failure. Alpha and Beta testing help to improve the 



Predicting module defects also finds application in 
legacy systems where it may not be possible to 
replace legacy systems through the practice of 
application retirement. Defect prediction provides a 
cost effective process to enhance them. 



The previous work carried out by the author [3] 
investigates the KC1 for defect classification using 
Decision Tree induction and Bayesian networks. 
Various pre-processing techniques were also 
investigated [4]. The results obtained are tabulated in 
table 1 and 2. 
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TABLE-I. CLASSIFICATION ACCURACY ON KC1 
DATASET 



KC1 Dataset 


Correctly 
classified 

% 


Incorrectly 
classified 

% 


Mean 

Absolute 

error 


Ramdom tree 


81.86 


18.14 


0.1924 


CART 


84.91 


15.09 


0.2095 


Bayesian 

logistic 

regression 


86.03 


13.97 


0.1397 



decades. There are several techniques proposed to 
classify the modules for identifying fault-prone 
modules 

III. DATA MINING TECHNIQUES 

Data Mining (DM) aims to establish something new 
from the facts recorder in the databases. Originally, 
data mining is a statistician's term for overusing data 
to draw in legitimate inferences. DM is the use of 
powerful tools to sift out important or significant 
traits that are previously unknown from databases or 
data warehouses. 



TABLEII. CLASSIFICATION ACCURACY 
PREPROCESSING IN KC1 DATA SET 



AFTER 





% correctly 
classified 


% Incorrectly 
Classified 


Random 
Tree 


94.5531 


5.4469 


Logistic 
regression 


95.6704 


4.3296 


CART 


96.7877 


3.2123 



In this paper, the efficacy of neural network for 
defect prediction using available model and our 
proposed model is verified. 

This paper is organized into the following sections. 
Section II describes software metrics, Section III 
describes data mining techniques for classification, 
Section IV gives an introduction to Neural Network 
used, Section V describes the dataset used in the 
work, Section VI includes the improved neural 
network technique and output obtained. The last 
section analyses and concludes the paper. 

II. SOFTWARE METRICS 

Software metrics are collected at various phases of 
the software development process. These metrics 
contain information of software and can be used to 
predict software quality in the early stages of 
software life cycle. 

Software reliability engineering is one of the most 
important aspects of software quality. Recent studies 
show that software metrics can be used in software 
module fault-proneness prediction. A software 
module has a series of metrics, some of which are 
related to fault-proneness. Multiple research works 
on the software quality prediction using the 
relationship between software metrics and software 
module's fault-proneness have been done in the last 



Software is prone to have errors and bugs. The 
process of software testing is to assess the quality of 
computer software and verify whether the software 
complies with software specification and customer 
needs. There are two ways to find errors in software 
testing: manual and automated. Manually debugging 
is laboured intensive and costly while automated 
debugging can classify and locate the software defect 
automatically. Data mining based software 
debugging is becoming more and more accepted and 
it can significantly reduce the amount of labour cost 
in software debugging. 

Data Mining extracts useful information and 
knowledge from huge amount of data. DM methods 
can be applied to the data generated in every stage of 
software life cycle such as design, development, 
testing, deployment and maintenance, and extract 
potential errors in the software. 

IV. NEURAL NETWORKS 

Neural networks consist of multiple layers of 
computational units, usually interconnected in a feed- 
forward way. Each neuron in one layer has directed 
connections to the neurons of the subsequent layer. In 
many applications the units of these networks apply a 
sigmoid function as an activation function. 

The feed forward neural network was the first and 
arguably simplest type of artificial neural network 
devised. As the majority of faults are found of its 
modules, there is a need to investigate the modules 
that are affected severely as compared to other 
modules and proper maintenance to be done on time 
especially for the critical applications Ebru Ardil et. 
al (2009). 

Algorithms based on neural networks have a lot of 
applications in knowledge engineering. In data 
mining, the following neural network architectures 
are used: 
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• Multilayered feed forward neural 
networks 

• Kohenen's self-organizing maps. 

A) Multilayered feed forward neural 
networks 



Kohenen's self-organizing maps provide means for 
visualization of multivariate data, because two 
clusters of similar members activate output neurons 
with small distance in the output layer. In other 
words, neurons that share a topological resemblance 
will be sensitive to inputs that are similar. This 
property has no other algorithm of cluster analysis. 



Multilayered feed forward neural networks (ANNs) 
are non-parametric regression methods, which 
approximate the underlying functionality in data by 
minimizing the loss function. The common loss 
function used for training and ANN is quadratic error 
function. ANN is used for adaptation supervised 
learning. Database form a training set. During 
training, specified items of data records are put as the 
input of neural network and its weights are changed 
in such a way that its output would approximate the 
values in the data set. After finishing learning 
process, the learned knowledge is represented by the 
values of neural network weights. For training, the 
algorithm of back propagation of error is often used. 



Xi 



Xi 



Hidden 
Layers 



Output 
Layer 



X n 




Fig. 1. Multilayaer Neural Network 

B) Kohenen's self-organizing maps 

Kohenen's self-organizing maps (SOMs) have 
become a promising technique in cluster analysis. 
They are adapted by unsupervised learning. In data 
mining, Kohenen's self-organizing maps based 
cluster techniques have the following advantages 
over standard statistical methods. 

DM typically deals with high-dimensional data. A 
record in a database typically consists of a large 
number of items. The data do not have regular 
multivariate distribution and thus the traditional 
statistical methods have their limitations and they are 
not effective. SOMs work with high-dimensional data 
efficiently. 



SOM is a dynamic system, which learns abstract 
structure in high-dimensional input space using low- 
dimensional space for representation. 

V. DATA SET 

Data from the NASA's Metric Data Program (MDP) 
data repository is made use of. The KC1 dataset used 
contains LOC measure, cyclomatic complexity, Base 
Halstead Measures, Derived Halstead measures from 
various software modules. 

The attributes used in this work is described briefly 
below 

LOC_BLANK - The number of blank lines in a 

module. 

LOC_CODE_AND_COMMENT - The number of 

lines which contain both code & comment in a 

module. 

LOC_COMMENTS - The number of lines of 

comments in a module. 

CYCLOMATIC_COMPLEXITY - The cyclomatic 

complexity of a module. 

DESIGN_COMPLEXITY - The design complexity 

of a module. 

ESSENTIAL_COMPLEXITY - The essential 

complexity of a module. 

LOC_EXECUTABLE - The number of lines of 

executable code for a module (not blank or comment) 

HALSTEAD_CONTENT - The Halstead length 

content of a module. 

HALSTEAD_DIFFICULTY - The Halstead 

difficulty metric of a module. 

HALSTEAD_EFFORT - The Halstead effort metric 

of a module. 

HALSTEAD_ERROR_EST - The Halstead error 

estimate metric of a module. 

HALSTEAD_LENGTH - The Halstead length metric 

of a module. 

HALSTEAD_LEVEL - The Halstead level metric of 

a module. 

HALSTEAD_PROG_TIME - The Halstead 

programming time metric of a module. 

HALSTEAD_VOLUME - The Halstead volume 

metric of a module. 

NUM_OPERANDS - The number of operands 

contained in a module. 
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NUM_OPERATORS - The number of operators 
contained in a module. 

NUM_UNIQUE_OPERANDS - The number of 
unique operands contained in a module. 
NUM_UNIQUE_OPERATORS - The number of 
unique operators contained in a module. 
LOC_TOTAL - The total number of lines for a given 
module. 

VI. PROPOSED METHODOLOGY & 
EXPERIMENTAL INVESTIGATION 



Where y is the input and w is the weight. L2 
Criterion is used to compute the cost function 
desirable. The error computed to the supervised 
learning procedure is the squared Euclidean distance 
between the network's output and the desired 
response. 

65 percent of the data was used as the training set and 
the remaining used as the test set. The classification 
accuracy obtained on KC1 dataset is 98.2%. 



The Multilayer Perceptron is an example of a 
supervised learning artificial neural network that is 
used extensively for the solution of a number of 
different problems, including classification, pattern 
recognition and interpolation. The algorithm for 
Perceptron Learning is based on the back- 
propagation rule. The hidden layer typically consists 
of either sigmoid or tanh function. The algorithm for 
multi layer perceptron neural network is given below. 

i. Present input and desired output 
Present input Yp = yO ,yl ,y2 ,...,yn-l and target 
output Cp = cO ,cl ,...,cm-l where n is the number of 
input nodes and m is the number of output nodes. 

ii. Calculate the actual output 
Each layer calculates the following: 
fxpj =f[w0y0 + wlyl + .... + wyn] 
jThis is then passed to the next layer as an input. The 
final layer outputs values opj. 

iii. Adapts weights, starting from the output we 
now work backwards. 
wij(t+ 1 ) = wij(t) + hppjopj , where h is a gain term 
and ppj is an error term for pattern p on node j. 



For output units 
ppj = kopj(l - opj){t ■ 



opj) 



The proposed fuzzy based neural model was able to 
classify better than Random Tree by 14.66%, CART 
by 11.41% and Bayesian logistic regression by 
10.50%. However the proposed method needs to be 
evaluated with other datasets to better test the 
performance in terms of consistency. 

The results obtained by our proposed methodology is 
improved over the regular multilayer perceptron 
model with sigmoidal hidden function by 3.92%. 
Figure 2 displays the accuracy obtained by various 
classification methods carried out. 
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For hidden units 

ppj = kopj(l - opj)[(pp0wj0 + pplwjl + ....+ppkwjk)] 



FIG.2. CLASSIFICATION ACCURACY ON KC1 
DATA SET 



where the sum is over the k nodes in the layer above 
node j. 

In this paper, a fuzzy bell hidden layer is proposed, 
that uses a bell shaped curve as its fuzzy member in 
the hidden layer and is given by 

1 



1 + 



K-W^ 



>tv, 



3*1 



CONCLUSION 

In this paper, it has been observed that the proposed 
Bell fuzzy based neural network model performs 
better than existing neural network model and other 
classification algorithms. Thus it can be very 
decisively said that Bell fuzzy function used in multi- 
perceptron neural network improves the classification 
accuracy of software defect prediction.. 
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Abstract- The task of the knowledge discovery and data mining 
process is to extract knowledge from data such that the resulting 
knowledge is useful in a given application. Obviously, only the 
user can determine whether the resulting knowledge satisfies this 
requirement. Moreover, what one user may find useful is not 
necessarily useful to another user. Visual data mining tackles the 
data mining tasks from this perspective enabling human 
involvement and incorporating the perceptivity of humans. The 
objective of this paper is to present the students performance 
through visualization mining method on data coming from 
educational institute. Such method together with the novel 
visualization technique described here allows the analyst to 
explore data and view significant differences among performance 
values of students. The results are immediately presented in a 
graphical form and the user is allowed to change settings in order 
to allow him or her to iteratively explore the data and find some 
useful knowledge. 

I. INTRODUCTION 

For data mining [1] to be effective, it is important to 
include the human in the data exploration process and 
combine the flexibility, creativity, and general knowledge of 
the human with the enormous storage capacity and the 
computational power of today's computers. Visual data 
exploration aims at integrating the human in the data 
exploration process, applying its perceptual abilities to the 
large data sets available in today's computer systems. The 
basic idea of visual data exploration is to present the data in 
some visual form, allowing the human to get insight into the 
data, draw conclusions, and directly interact with the data. 
Visual data mining techniques have proven to be of high value 
in exploratory data analysis and they also have a high potential 
for exploring large databases. These huge databases contain a 
wealth of data and constitute a potential goldmine of valuable 
information. As new courses and new colleges emerges, the 
structure of the educational database changes. Finding the 
valuable information hidden in those databases and identifying 
and constructing appropriate models is a difficult task. Data 
mining techniques play an important role at each stop of the 
information discovery process and visual data exploration 
usually allows a faster data exploration and often provides 
better results, especially in cases where automatic algorithms 
fail. In addition, visual data exploration techniques provide a 



much higher degree of confidence in the findings of the 
exploration. This fact leads to a high demand for visual 
exploration techniques and makes them indispensable in 
conjunction with automatic exploration techniques. 

The main contribution in this study is addressing the 
capabilities and strengths of data mining technology in 
identifying placement of students and to guide the teachers to 
concentrate on appropriate attribute associated and counsel the 
students or arrange for suitable placement to them. In this 
work, we propose a dynamical framework for association rule 
mining that integrates interactive visualization techniques in 
order to allow users to drive the association rule finding 
process, giving them control and visual cues to ease 
understanding of both the process and its results. 

II. ASSOCIATION RULE MINING (ARM) 

Association Rules Mining (ARM) [2] can be divided into 
two sub problems: the generation of the frequent itemsets 
lattice and the generation of association rules. The complexity 
of the first sub problem is exponential. Let |I|=m the number 
of items, the search space to enumerate all possible frequent 

m 

itemsets is equal to 2 , and so exponential in m [2]. Let I ={a b 
a 2 , ... , a m } be a set of items, and let T = {ti, t 2 , ... , t n } be a set 
of transactions establishing the database, where every 

transaction ti is composed of a subset XCI of items. A set of 

items X CI is called itemset A transaction ti contains an 

itemset X in I, if X CZ ti. Several ARM published papers are 
based on two main indices which are support and confidence 
[2]. The support of an itemset is the percentage of transactions 
in a database where this itemset is one subgroup. The 
confidence is the conditional probability that a transaction 
contains an itemset knowing that it contains another itemset. 

An itemset is frequent if support (X) > minsup, where minsup 
is the user-specified minimum support. An association rule is 
strong if confidence(r) > minconf, where minconf is the user- 
specified minimum confidence. Left part of an association rule 
is called antecedent and right part is called conclusion. Our 
motivations are described hereafter. 
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III. MOTIVATION 

The number of generated rules is a major problem on 
association rules mining. This number is too significant and 
leads to another problem called Knowledge mining. The 
human cycles spent in analyzing knowledge is the real bottle 
neck in datamining. This issue can limit the final user's 
expertise because of a strong cognitive activity. To solve it, 
visual datamining became an important research area. Indeed, 
extracting relevant information is very difficult when it is 
hidden in a large amount of data. Visual data mining attempts 
to improve the KDD process by offering adapted visualization 
tools which allow tackling various known problems. Those 
tools can use several kinds of visualization techniques which 
allow simplifying the acquisition of knowledge by the human 
mind. It can handle more data visually and extract relevant 
information quickly. 

Indeed, in most real life databases, thousands and even 
millions of high-confidence rules are generated, among which 
many are redundant. In this paper, we are interested in the 
most used kind of visualization categories in data mining, i.e., 
use visualization techniques to present the information catched 
out from the mining process. Visualization tools became more 
appealing when handling large data sets with complex 
relationships, since information presented in the form of 
images is more direct and easily understood by humans. 
Visualization tools allow users to work in an interactive 
environment with ease in understanding rules. In a based 
tabular view of association rules, all strong rules are 
represented as in a tabular representation format (rule table), in 
which each entry corresponds to a rule. All rules can be 
displayed in different order, such as order by premise, 
conclusion, support or confidence. This helps users to have a 
clearer view of the rules and locate a particular rule more 
easily. 

IV. VISUAL DATA MINING 

The rise of KDD revealed new problems as knowledge 
mining. These large amounts of knowledge must be explored 
with specific advanced tools. Indeed, expertise requires an 
important cognitive work, a fortiori, a harmful waste of time 
for industrial. Extracting nuggets is a difficult task when 
relevant information is hidden in a large amount of data. In 
order to tackle this issue, visual datamining was conceived to 
propose visual tools adapted to several well-known KDD 
tasks. These tools contribute to the effectiveness of the 
processes implemented by giving understandable 
representations while facilitating interaction with experts. 
Visual data mining is present during all KDD process: 
upstream to apprehend the data and to carry out the first 
selections, during the mining, downstream to evaluate the 
obtained results and to display them. Visual tools became 
major components because of the increasing role of the expert 
within KDD process. Visual datamining integrates concepts 
resulting from various domains such as visual perception, 
cognitive psychology, visualization metaphors, information 
visualization, etc. 



We focus on visualization during the post processing stage 
and we are interested by ARM. Independently of both context 
and task, ARM has a main drawback which is the high number 
of generated rules. Several works on filtering rules were 
proposed and a state of the art was presented in [3]. Although 
reducing the whole of generated rules significantly, this 
number remains however important. Expert must be able to 
easily interact with an environment of datamining in order to 
more easily understand the displayed results. This point is 
essential for the global performance of the system. Visual 
tools for association rules were proposed to reduce this 
cognitive analysis but they remain limited [3]. 

V. VISUAL ASSOCIATION RULE MINING 

Various works already exist to help expert analysis in text- 
mode [4]. Several works on visual rules exploration were 
published [2], [5], [6], [7]. The main beliefs of our interactive 
ARM are described hereafter. All these tools use several 
methods which are textual, 2D or 3D way. The choice of one 
of them proves to be a difficult work. Moreover, their 
interpretations can vary according to the expert. Each one of 
these techniques presents advantages and drawbacks. It is 
necessary to take them into account for the initial choice of the 
representation. The effectiveness of these approaches is 
dependent on the input data files. These representations are 
understandable for small quantities of data but become 
complex when these quantities increase. Indeed, particular 
information can not be sufficiently perceptible in the mass. 
The common limitation of all the representations is that if they 
are global, they quickly become unreadable (size of the objects 
in 2D, occlusions in 3D) and if they are detailed, they do not 
provide an overall picture on these data to the expert. 

VI. RELATED WORK 

Traditionally, many simple methods are designed to render 
small amount of data or statistical features of big data sets, 
such as histogram, pie, tree, etc. To visualize more complex 
data, modern scientific visualization utilizes more advanced 
techniques. Visualization techniques, such as EXVIS [8], 
Chernoff Faces [9], icons [10] and m-Arm Glyph [11], are 
often called glyph-based methods. Glyphs are graphical 
entities whose visual features, such as shape, orientation, color 
and size, are used to encode attributes of an underlying 
dataset, and glyphs are often used for interactive exploration 
of data sets [12]. Glyph-based techniques range from 
representation via individual icons to the formation of texture 
and color patterns through the overlay of many thousands of 
glyphs [13]. Chernoff used facial characteristics to represent 
information in a multivariate dataset [14]. Each dimension of 
the data set encodes one facial feature, such as nose, eyes, 
eyebrows, mouth, or jowls. Glyphmaker proposed by Foley 
and Ribarsky visualize multivariate datasets in an interactive 
fashion [14]. Levkowitz described a prototype system for 
combining colored squares to produce patterns to represent an 
underlying multivariate dataset [15]. In [10] an icon encodes 
six dimensions by six lines of different colors within a square 
icon. In [13] Levkowitz describes the combination of textures 
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and colors in a visualization system. The m-Arm Glyph by 
Pickett and Grinstein [11] consists of a main axis and m arms, 
and the length and thickness of each arm and the angles 
between each arm and main axis are used to encode different 
dimensions of a data set. [6] describes a glyph-based system 
for large high dimensional datasets. These techniques are 
incapable of visualizing large amount of high dimensional data 
because: 

• Lack of human computer interaction. 

• Lack of integration with other data mining and 
knowledge discovery (KDD) tools. 

VII. PROPOSED WORK 

Nowadays, higher educational organizations are facing a 
very high competitive environment and are aiming to get more 
competitive advantages over the other business competitions. 
These organizations should improve the methodology of 
teaching, placement and counseling of students. They consider 
students and teachers as their main assets and they want to 
improve their key process indicators by effective and efficient 
use of their assets 

Students' academic performance is critical for educational 
institutions because strategic programs can be planned in 
improving or maintaining students' performance during their 
period of studies in the institutions. The academic 
performance in this study is measured by certain attribute as 
indicated in Table 1. This study presents the work of data 
mining in predicting the final placement of students. This 
study applies association rule mining technique to choose the 
best prediction and analysis. The list of students who are 
predicted as likely to drop from the selection criterion by data 
mining is then turned over to teachers and management for 
direct or indirect intervention. 

For example, let us consider the transaction database of 
few students from Students' repository of institute which 
shows the students general and academic grades in different 
courses they enrolled for during their years of attendance in 
the institution. Student performance score is basically 
determined by the sum total of the continuous assessment and 
the examination scores. In most institutions the continuous 
assessment which includes various assignments, class tests, 
group presentations is summed up to weigh 30% of the total 
score while the main semester examination is 70%. To 
differentiate different students' performances we have selected 
different attributes as attendance, Mark, Activity etc. .as 
shown in table 1 . 

Educational institutions with Association rule mining can 
predict the student's performance more accurately, which in 
turn can result in quality education. 

A Student Level Analysis 

Successfully training the student requires analyzing the 
data at the student level. Using the associated discovery data 
mining technique, educational institutions can more accurately 
select the kind of training to offer to different kinds of 



students. With the help of this technique, educational 
institutions can. 

i. Segment the student database to create student 
profiles. 

ii. Conduct analysis on a single student segment for a 
single factor. For example, the institution can perform 
in-depth analysis of the relationship between 
attendance and academic achievement. 

iii. Analyze the student segments for multiple factors 
using group processing and multiple target variables. 
For example, — What are the characters shared by 
students who drop out from colleges? 

iv. Perform sequential (over time) basket analysis on 
student segments. For example, — What percentage of 
high attendance holders also achieved in academic 
side also? 

B. Developing new strategies 

Teachers can increase the placement percentage by 
identifying the most lucrative student segments and organize 
the training sessions accordingly. The results may be affected, 
if teachers do not offer the right kind of training to the right 
student segment at the right time. With data mining operations 
such as segmentation or association analysis, institutions can 
now utilize all of their available information for betterment of 
students. 

TABLE I ATTRIBUTE LIST 



ATTRNAME 


ATTR 


Possible 
Values 


Enrolment No. 


ENR 


Yes, No 


Attendance 


ATT 


Poor, Good, 
Average 


10+2 Grade 


INT 


A, B, C 


Area of 
expertise 


EXP 


M,C,E 


Gender 


G 


M,F 


Fund 


F 


P, S,F 


Student 
Department 


STD 


ME, CS, IT 


Activities 
performed by 
the 
student 


ACT 


A,B,C 


Percentage of 

practical 

session 


PSA 


A,B,C 


Exercise given 
f/acher 


ET 


A,B, C 


Average mark 
of the 
experience 
report 


ER 


A,B,C 


Final mark 


MARK 


A,B,C 


Evaluation 


EVL 


A,B,C 
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VIII. SYSTEM ARCHITECTURE 
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Figure 1 . System Architecture 

The system architecture is shown in Figure. The database 
resides in the server machine. The stored procedures (Oracle) 
reside in the server side. Our VB application runs in the client 
machine. It consists of several modules: Login, Rule 
Generator, and Visualization module. Login module is used to 
connect to the database server. Rule Generator is used to 
mining the association rules given the information provided by 
the user. Visualization module consists of two sub-modules 
Rule table and 3-D visualization. These modules can be 
accessed using the Main window. 

Knowledge Extraction Stage 

Rendering millions of icons is computationally expensive, 
and interpretation and analysis to be performed by the user is 
even harder. A visualization system has to provide not only a 
"loyal" picture of the original dataset, but also an "improved" 
picture to a viewer for easier interpretation and knowledge 
extraction. Integration of analysis functionality is important 
and necessary to help the viewer to extract knowledge from 
the display. The basic requirement about a visualization 
system as: 

"Different data values should be visualized differently, 
and the more different the data values are, the more 
different they should look". 

But what a viewer wants to find with a visualization 
system is not data values themselves, instead, it is the 
information or knowledge represented by data values. So, the 
above requirement can be better stated as: 

"Different information should be visualized differently, 
and the more different the information is, the more 
different it should look". 

To help a viewer on knowledge extraction a visualization 
system has to deal with the problem of non-uniform 
knowledge/information distribution. It is common in some 
data sets or fields that a small difference of a value could mean 
a big difference, which means the knowledge and information 



is not distributed uniformly within data values. A user would 
like a visualization system to be able to show these knowledge 
differences clearly. To be specific, two differences of same 
amount in data values may not necessarily be rendered by the 
identical difference in visual elements on the screen. Instead 
the difference representing more information should be 
displayed more significantly to get attention from a viewer. 

Interactive Visualization Model 
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Figure 2. Visualization Model 

In Figure 2 we give an interactive visualization model which 
has the following properties: 

1) Interaction: It is clear that integration of domain knowledge 
to a visualization system is very important due to the 
problem of non-uniform knowledge distribution. To a 
visualization system integration of domain knowledge can 
be achieved by choosing proper association function and 
transformation function during visualization process. 
However, there is no universal technique for all fields, data 
sets or users, and a visualization system should be 
interactive and provide a mechanism for views to adjust or 
change association and transformation functions during 
visualization process. And each data set or field has to be 
studied individually and visualized interactively before its 
important information can be revealed, which can only be 
performed by viewers or domain experts. By interaction a 
viewer can guide a visualization system step by step to 
display what he is interested in more and more clearly. 

2) Correctness: We propose the following criteria for 
"correct" visualization: 

a) If possible a visualization system should show 
different dimensions of a data set differently 
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through different visual objects or visual elements 
of one visual object. 

b) The more different the values are, the more 
differently they should be rendered. Since we may 
not know the distribution of a dataset, assigning 
data values to visual elements/properties may not 
make full usage of available visual 
elements/properties, a clustering step is preferred. 

c) The more different the information represented by 
data values are, the more differently they should be 
rendered. A distinguished visual difference 
between different information can help viewers 
better, which can be achieved by interaction 
between a visualization system and viewers. In this 
interaction process, viewers can fine tune the 
transformation between data values and visual 
elements, and domain knowledge is obtained and 
reflected through a more customized display. 

3) "Maximizing" rule: To optimize the rendering quality, the 
maximal range of visual objects/elements should be used as 
default settings. 

IX. IMPLEMENTATION METHODOLOGY 

At the beginning of any mining task, the system acquires 
the support for each attribute category defined at discretization 
step during preprocessing phase of a generalized composite 
record in the corresponding cluster. Figure 3 depicts the user 
interface screens that acquire these supports. In order to show 
how our technique has enhanced the rule generated, we 
conducted the following experiment steps: Run the system and 
give variable support for each attribute category based on the 
user interest. 

1) Count the number of rules generated and the number 
of used premises in these rules. 

2) Rerun the system and give equal support for all 
attributes categories. 

3) Count the number of rules and the premises used in 
these rules. 

4) Examine the quality of rules generated in each case 
by comparing the number of rules and premises used. 



% Association Rules 



File Genrate Rule Visualization Mine Data 
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The main window consists of the menu, toolbar, and a text 
area. The user can connect to different databases, here Oracle 
through the connect sub-menu and disconnect from the same 
through the disconnect sub-menu. Under the Generate Rule 
menu, the user can choose generate rules. The operations of 
rule generation and rule visualization are mainly done through 
the menu. 

We use VB standard EXE as the software development 
tool to implement our project. VB provides an Integrated 
Development Environment (IDE), which makes interface 
design, program debugging very efficiently. The menu can be 
implemented using the Menu Editor. All the objects in the 
main window can be designed visually. 
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Figure 4. Menu Editor 

After the user chooses "Connect" menu item, a Login 
window will be brought up. Login module of Association Rule 
Software is described in fig. 5. After the user provided all the 
needed information, the user can choose to "Connect" to the 
DBMS 



Private Sub cmdOK_CHck() 

a.connect txtUserName.Text, txtPassword.Text 

If Loginsucceeded Then 

Forml.mnuconn.Enabled = Not Loginsucceeded 

Forml.mnudisc.Enabled = Loginsucceeded 

Forml.Toolbarl.Buttons(l).Enabled=Not Loginsucceeded 

Unload Me 

Forml.Show 

End If 

End Sub 

Figure 5. Login Module 



Figure 3. Interface Screen 
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Rule Generator 

For each input data set, some parameters have to be 
specified by the user for the association rule generation. This 
kind of information can be arranged in the concerned table, 
because the data is not always in the same table, and 
sometimes it is needed to obtain the data from two or more 
different tables, the user should have the ability to select 
multiple tables as the data source in the procedure. The user 
may also want to specify the lowest support and confidence 
value to get the interested association rules. The value of stop 
level is used to let the user decide that after how much passes 
that the user wants the rule generation needs to be canceled. 
The information is taken from the transaction table and the 
user can click the "Generate Rules" menu button to begin to 
merge the data from different tables. Then the association rule 
generation algorithm will be called to generate the rules. 



1- Select the mining task and consequently the appropriate 
cluster 

2- Get the confidence threshold for generating a rule (this 
means that the rule will only be generated if the number of 
occurrences of records described by this rule divided by the 
total number of records in the cluster greater than the 
given confidence threshold) 

3- Construct a matrix (calculated relative weight) with 
number of rows equal to the number of attributes (m) and 
number of columns (n) equal to the maximum number of 
categories of a certain attribute 

4- Using the appropriate cluster, fill in the calculated relative 
matrix with the relative weight of each attribute category in 
this cluster 

5- Compare the calculated relative weight with the user given 
support and mark irrelevant attributes categories. 

6- For each generalized composite record do 

7- For each generalized composite record attribute do { if the 
attribute category is irrelevant then mark it as irrelevant 
copy relevant attributes category into a new table} 

8- Group similar rows in the new table and calculate a 
confidence value for this grouped records 

9- Generate rules 

Proposed algorithm 



The algorithm is based on well known existing techniques 
to obtain association rules as Apriori algorithm. This 
algorithm is modified to enable a user to control and impose 
his area of focus during knowledge discovery steps in order to 
overcome the loss of information problem and to enable 
him/her to generate rules that he/she is interested in. The 
proposed algorithm solved this problem by allowing the user 
to define the relative weight or support of each attribute 
interval category such that the mining algorithm could 
generate rules using this attribute interval category only if this 
support is satisfied. 

The generated rules can be visualized in either the table format 
or 2D, 3D format by selecting the appropriate visualization 
Menu Item. As we execute the program the title screen comes 
into action which is shown in fig. 3. 

Further we click on the Visualization Menu to get different 
graph related to Association rules. We can further select the 



graph of our choice by clicking on any of the option button 
available in the visualization effect window, as shown in the 
figure 6. 

Some of the generated rules are given in Table 2 in a form 
that is understandable by humans. In Table 2, the first column 
represents the rule number, the generated rules are presented 
in the second column, the number of the students who 
successfully satisfy the rules is given in the third column, 
and the number of attributes contained in the rule is given in 
the last column. The table shows the rules in a descending 
order depending on the number of the students who 
successfully have satisfied the rule. This ordering helps in 
determining the most significant rule. For the generated rules, 
the longest rule consists of 10 attributes while the shorter 
rule contained only 3 attributes. 

TABLE 2 GENERATED RULES 



Rule# 


Rules 


#Obj 


# Attrib 


7 


IF ENR = Y, ATT = A, INT=A, G 

= M, STD=IT, ACT=A, PSA=A, 

ET=A, ER=B, MARK=A THEN 

EVL = A 


13 


10 


3 


IF ENR = Y, ATT = B, INT=A, G 

= F, STD=IT, ACT=A, PSA=A, 

ET=A, ER=B, MARK=A THEN 

EVL = A 


9 


10 


11 


IF ENR = Y, ATT = B, INT=A, G = 

M, STD=CS, ACT=A, PSA=C, 

ET=C, ER=B, MARK=B THEN 

EVL = B 


9 


10 


17 


IF ENR = Y, ATT = C, INT=A, 

G = M, F=SC, STD=ME, 

ACT=A, PSA=B, MARK=B 

THEN EVL = A 


8 


9 


9 


IF ENR = Y, ATT = A, INT=B, G 

= F, ACT=A, PSA=A, ET=A, 

ER=B, MARK=B THEN EVL = 

B 


5 


8 


14 


IF ENR = Y, ATT = C, G = M, 

ACT=B, PSA=A, ER=B, 

MARK=A THEN EVL = A 


4 


7 


3 


IF ENR = Y, ATT = C, MARK=C 
THEN EVL = C 


3 


3 



We implemented the mapping of intermediate rule table 
into the format that the user can understand easily. A 
visualization module that includes rule table and 2-D, 3-D 
graphics was developed to help the user get the interested 
information easier through sorting, and filtering functions. 
Besides the performance our software can access the data 
stored in multiple data tables through ODBC such as Oracle. 
Our visualization module uses 'rule-item' relationship so that 
it can display more rules at one time. In additional, the rule 
sorting and filtering ability of our visualization module gives 
the user more flexibility and efficiency in managing and 
understanding the association rule. In our implementation, we 
store the generated rules in the database. Once the rules are 
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stored in database, they can be easily handled because of the 
SQL capabilities. 




Figure 6. Visualization effect 



X. CONCLUSION 

The framework proposed will ease the task of association 
rule mining by giving users greater control over the mining 
task and by improving their ability to interpret the rules, 
evaluate their relevance and obtain insight on the knowledge 
mined from large datasets. We rely on interactive 
visualizations as an efficient approach to bridge the gap 
between task automation and user control in mining tasks. 

This study has bridge the gap in educational data analysis 
and shows the potential of the association rule mining 
algorithm for enhancing the effectiveness of academic 
planners and level advisers in higher institutions of leaning. 
The analysis reveals some hidden patterns of students' which 
could serve as bedrock for academic planners in making 
academic decisions and an aid in the curriculum re- structuring 
and modification with a view to improving students' 
performance. To adopt this approach a larger number of 
students should be considered from the first year to the final 
year in the institution. This will surely reveal more interesting 
patterns. With all these observations, if academic planners can 
make use of the extracted hidden patterns from students' 
performances using association rule mining approach, it will 
surely help in curriculum re -structuring and also, help in 
monitoring the students' ability. This will enable the academic 
advisers to guide students properly on courses they should 
enroll for. This, eventually, tends to increase the student 
placement rate. 



XL FUTURE 

We conclude by remarking that visualization of association 
mining results in particular and data mining results in general 
is a promising area of future work. Educational, research, 
government and business institute can benefit significantly 
from the symbiosis of data mining and information 
visualization disciplines. 



REFERENCES 

[1.] Fayyad U., Piatetsky- Shapiro G., Smyth P.:"From Data Mining to 
Knowledge Discovery: An Overview", Advances in Knowledge 
Discovery and Data Mining, AAAI Press, Menlo park, CA, pp. 1-30. 

[2.] AGRAWALR,MANNILAH,SRIKANTR,TOIVONENH,& 

VERKAMOA.I.(1996), Fast discovery of association rules, Advances 
in knowledge discovery and data mining, American Association for 
Artificial Intelligence, p. 307-328. 

[3.] O. Couturier, E. Mephu Nguifo, and B. Noiret. A formal approach to 
occlusion and optimization in association rules visualization. In 
Proceedings of VDM of IEEE 9th International Conference on 
Information Visualization (IV@VDM'05), Poster, UK, July 2005. 

[4.] LIU B., HSU W .,W ANG K., CHEN S. (1999), Visually aided 
exploration of interesting association rules, Proceedings of the 3 
Pacific-Asia Conference on Knowledge Discovery and 
Datamining(PAKDD'99), Beijing, China, p. 380-389. 

[5.] BEN YAHIA S., MEPHU NGUIFO E. (2004), Emulating a 
cooperative behavior tl jn a generic association rule visualization tool. In 
Proceedings ofthel6 IEEE International Conference on Tools with 
Artificial Intelligence (ICTAI'04), BocaRaton, Florida, USA. 

[6.] BLANCHARD J., GUILLET F., & BRIAND H. (2003), Exploratory 
Visualization for Association Rule Rummaging, Proceedings ofthe4 
International Workshop on Multimedia Data Mining MDM/KDD2003, 
Washington, D.C., U.S.A., p. 107-1 14. 

[7.] W ONG P.C., W HITNEY P., & THOMAS J. (2000), Visualizing 
Association Rules for Text Mining, Proceedings of the 1999 IEEE 
Symposium on Information Visualization (INFO VIS '00), Salt Lake 
City, Utah, USA, p. 120-128. 

[8.] Grinstein, G. G., Pickett, R. M. and Williams, M., EXVIS: An 
Exploratory Data Visualization Environment. Proceedings of Graphics 
Interface '89 pages 254-261, London, Canada, 1989. 

[9.] Chernoff, H. The use of faces to represent points in k-dimensional 
space graphically. Journal of the American Statistical Association 68, 
342, pages 361-367, 1973. 

[10.] Levkowitz, H. Color Icons: Merging Color and Texture Perception for 
Integrated Visualization of Multiple Parameter, Proceedings of IEEE 
Visualization' 91 Conference, San Diego, CA, Oct. 1996 

[11.] Pickett, R. M. and Grinstein, G. G., Iconographies Displays for 
Visualizing Multidimensional Data. IEEE Conference on Systems, 
Man and Cybernetics. China, 1988. 

[12.] Wegenkittl, R., Lffelmann, H., Grller, E., Visualizing the behavior of 
higher dimensional dynamical systems. Proceedings of the conference 
on Visualization '97, 1997, Phoenix, Arizona, United States 

[13.] Christopher, G. Healey, James T. Enns, Large Datasets at a Glance: 
Combining Textures and Colors in Scientific Visualization. IEEE 
Transactions on Visualization and Computer Graphics, Volume 5, 
Issue 2, 1999. 

[14.] Foley, J., and Ribarsky, W. Next-generation data visualization tools. 
Scientific Visualization: Advances and Challenges, L. Rosenblum, Ed. 
Academic Press, San Diego, California, pages 103-127, 1994. 

[15.] Laidlaw, D. H., Ahrens, E.T., Kremers, D., Avalos, M.J., Jacobs, R.E., 
and Readhead, C. Visualizing diffusion tensor images of the mouse 
spinal cord. Proceedings of Visualization '98, pages 127-134, 1998 

[16.] Pickett, R. M. and Grinstein, G. G., Iconographies Displays for 
Visualizing Multidimensional Data. IEEE Conference on Systems, 
Man and Cybernetics. China, 1988. 

AUTHORS PROFILE 

Mohammad Kamran is a Software Developer. His primary interests lay in the 
areas of data mining and association rules. Nowadays, he is a research scholar 
in Computer Science at Integral University. His paper summarizes the current 
state of his thesis work on the field of "study of association rules in large 
database. 



135 



http://sites.google.com/site/ijcsis/ 
ISSN 1947-5500 



(IJCSIS) International Journal of Computer Science and Information Security, 
Vol.9, No. 2, February 2011 



Video Delivery based on Multi- Constraint Genetic and Tabu Search Algorithms 



Nibras Abdullah 1 , Mahmoud Baklizi 1 , Ola Al-wesabi 2 , Ali Abdulqader 1 , Sureswaran Ramadass 1 , Sima Ahmadpour 1 

1: { abdullahfaqera, mbaklizi, ali, sures, sima} @ nav6.org , ola_osabi@yahoo.com 

1 : National Advanced IPv6 Centre of Excellence 

1 : Universiti Sains Malaysia 

1 : Penang, Malaysia 



Abstract — The rapid growth of wireless communication and 
networking protocols, such as H802.ll and cellular mobile 
networks, is bringing video into our lives anytime and anywhere 
on any device. The video delivery over a wireless network faces 
several challenges going forward such as limitation, bandwidth 
variation, and high error rate so on. This paper proposed a new 
approach to improve the performance of video delivery, called 
Video Delivery based on Multi- Constraint Genetic and Tabu 
Search algorithms. In this paper, GA is used to find the faceable 
paths and Tabu search is used to select the best path from those 
paths that help to enhance the bandwidth delay and to improve 
the packet loss for wireless video content delivery. 

Keywords-; GA, Tabu Search, Multi-hop network, and Video 
delivery. 



I. 



Introduction 



In recent years, one of the real time applications is video 
conference systems that are widely used. In additions, real- 
time embedded systems are found in many diverse application 
areas including automotive electronics, avionics, 
telecommunications, space systems, medical imaging, and 
consumer electronics. The transport of real time video streams 
over the Internet by using wired and wireless multimedia 
delivery faces several challenges such as random channel 
variation, bandwidth scarcity and limited storage capacity [1]. 
The quality of service (QoS) of the video should have 
assurance of low bit rate. In addition, there are different 
applications have various QoS requirements to achieve users' 
satisfaction. QoS depends on some of the parameters such as: 
throughput, bandwidth, delay, error rate control, and packet 
loss [2] [3] [4] [5]. According to those parameters, the 
transportation paths are chosen. Nowadays, optimal path 
routing algorithms do not support alternate routing. If the 
existing path is the best path, and it cannot accept a new flow, 
the associated traffic cannot be transmitted, even if the 
appropriate alternative path is existing. Hence, clearly the 
quality of service routing algorithms must be adaptable, 
flexible, and intelligent enough to make a fast decision. To 
achieve this, a Genetic Algorithm (GA) based on the 
computational strategies that inspired by natural processes is 
used. GA is a global optimization technique derived from the 
principle of nature selection and evolutionary computing or 
technique [6] [7] [8] [9]. GA- theoretically and empirically- has 
been proven to be a robust search technique. Each possible 
point in the search space of the problem is encoded into a 
suitable representation for applying GA. In GA, each 
population of individual solutions with fitness value is 



transformed to a new generation of the population, depending 
on the Darwinian principle of the survival of the fitness. By 
applying genetic operators, such as crossover and mutation, 
GA produces better approximations to the solutions. Many 
routing algorithms based on GA have been proposed 
[2] [10] [11]. Selection and reproduction processing at each 
iteration produces a new generation of approximations. The 
outline of the basic GA is shown in Figure 1 . 
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Figure 1. Outline of the basic GA [12] 

Genetic representation is considered the encoding of the 
solutions as arrays of integers. 
The stages of a GA are: 

1 . Select initial population 

2. Determine the fitness of all initial individuals of the 
population 

3. Do 

1. Select the best-ranking individuals to reproduce. 

2. Breed a new generation through crossover and mutation 
(genetic operations) and give birth to offspring. 

3. Evaluate the individual fitness of the offspring. 

4. Replace the lowest ranked part of population with 
offspring. 

4. While (not terminating condition). 

In this paper, we propose a new approach based on genetic 
algorithm combined with Tabu search technique to get the 
ability to use the past experiences to improve current decision- 
making to choose the efficiency paths. 

Tabu search is a global heuristic technique which attempts to 
prevent from falling into local optimum by making a special 
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list called Tabu. Every solution has been recently chosen is 
assigned in a Tabu list that is called "taboo" for a short period 
of time depending on this list length. This decreases the 
probability of repeating in the same solution and so that it 
makes more opportunities for enhancement by moving into the 
unexplored areas of the search space. In 1997, Glover and 
Laguna give in their work a comprehensive description of 
Tabu search technique [13]. In addition, many algorithms 
based on Tabu Search has been done and gotten much better 
improvements [14][15][16]. The basic idea of the Tabu search 
technique is shown in Figure 2. 
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hands, routing change/ break that frequently occurred in multi- 
hop networks is considered another reason of packet losses. 
These packet losses should be awarded because it is critical to 
perform correct error control and resource allocation, 
especially for multimedia streaming applications. 
- The need for increasing QoS support mechanisms in multi- 
hop wireless networks which the standard multi-hope 
networks- IEEE 802.11- has a serious shortcoming in the 
environment of a multi-hop because of contention from a 
neighbor traffics and hidden terminal effects. 
-Routing layer, MAC layer, and physical layer together 
compete for the network resource in a wireless network. For 
wireless networks, the traditional "layered" protocol stack is 
not sufficient because of the direct connecting between the 
physical layer and the upper layers [18]. 

Multimedia video applications have diverse QoS 
requirements. The QoS requirements are expressed by the QoS 
parameters. The QoS parameters are: delay, hop count, Jitter 
delay, bit rate error, and bandwidth. 

Consider a Network G (N, E), where TV is the set of nodes, and 
E is the set of edges in which each link (u -> v) e E that is 
associated with link weights wi (u -> v) > 0, for all i = 1, ... /. 
Given / constraint Ki, where i = 1, ... /, the multiple constraint 
problem is to find a path p from the source (initial node, i) to 
destination node t as shown in Figure 3. 



Figure 2. Tabu search technique 
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II. Problem statements 

There are several basic challenges should be solved to 
provide high quality of multimedia delivery on multi-hop 
wireless networks. 

- It is familiar that the rate of the error bit (BER) of wireless 
network is much higher than that in the links of the wired line. 
The shared wireless media and contention from neighbor 
traffic increase the exacerbation the restrictions of bandwidth 
and then attend the error of the channel in the multi-hop 
network. The compressed bit stream is fragile in the face of 
the loss of the channel while video coder can compress video 
efficiently such as MPEG and H.26x. 

- The congestion in the wireless network is not the only 
reason for losses of the packet which there are many packet 
losses come as a consequence of the random channel error that 
can be measured over multi-hope network [17]. On other 



Figure 3. A sample Network 

III. Proposed Method 

The flowchart of the proposed method as shown in Figure 4 
represents how to solve the problem by getting a faceable path 
p from source node i to destination node t such that: 

Wf(p) =Z(u^)e P OiO^ v) <K t for alii = 1, ...,0 ...(1) 

Where, 

Population - is all available paths. 

Parent Selection- is a selection strategy that selects two 

individuals from the population with the lowest fitness value. 

Recombination- is basically Crossover and Mutation. 

Survivor Selection- replaces two individuals from the 

population with the lowest offspring. 
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Termination- means the termination by time iterations or the 
condition is achieved. 

Representation and Encoding- Encoding is one of the 
problems that are found when GA is used for getting a 
solution. Encoding depends on the problem that GA is applied. 
In this paper, the genes are represented by the tree junction, 
and the network is represented by a tree network [19]. The 
length of every chromosome is the same using this coding 
method and the genetic operations are achieved in the tree 
junction. The encoding procedure represents in Figure 3 as a 
sample network which node / is the source node and t is the 
destination node. 

Initial Population- is generated randomly by choosing 
feasible points in the gene coding that forms a path. 
Population size refers to the number of chromosomes that 
identified in one generation. GA has a few probabilities to 
execute the crossover when there are a few chromosomes 
which a small part of the search is observed. Moreover, GA 
will slow down if there are numerous chromosomes. In our 
proposal, the size of the initial population depends on the 
number of the outgoing links from the source. 



The fitness function that is utilized in this paper to find the 
faceable paths is given in equation 2. 
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Figure 4. Proposed Algorithm flowchart 

Fitness function Evaluation- The correlation of fitness value to 
every solution is accomplished during of a fitness function. 



F = max 



I i=1 Wj(P) 



(2) 



Where, / is the total number of constraints presumed, p is the 
path, Ki is the maximum compatible constraint value 
identified for the application, and wi is the link weights which 
is static and depends on the physical proprieties of the link. 
The initial population with the fitness value will compute for 
each chromosome. 

Chromosome Selection- Chromosomes are chosen from the 
initial population to be parents. Depending on Darwin's 
evolution theory, the best Chromosomes should be alive and 
generate offspring. Many methods are available for selecting 
the chromosomes such as elitism selection, steady state 
selection, tournament selection, roulette wheel selection, etc. 
In this paper, we prefer to use the elitism selection method. 
Elitism is the method which copies the best chromosomes to 
new population. The operation of genetic is done by selecting 
the chromosomes, sorting them depend on the fitness value in 
the initial population, and then choosing the first two at the top 
of the list. 

Crossover and Mutation- are two fundamental factors of GA, 
which is considered the main performance of GA. These 
operations will be implemented by encoding that depends on 
the problem that will be solved by GA [2]. We prefer in this 
paper to use a single point crossover at the tree junction to 
generate new offspring. The mutation point chosen is the 
points that cause the infringement of satisfaction of constraint. 
The proposed method is divided into two parts: Preprocessing 
part and processing part as the following: 
Preprocessing part: In this part, a short message sends 
through the faceable (available) paths from the initial point 
(client) to the target point (server), including the time and the 
length of a message. A wireless network is connected by 
multi-hops and routers as shown in Figure 3. Then, genetic 
algorithm is used to find the available paths to the server that 
is considered the central point for communications. After that, 
those paths will store in Tabu list, which determines the 
efficient paths by Tabu search technique in the processing 
part. 

Processing part: The efficient path will be chosen from Tabu 
list in this part. After receiving the message, the information 
that is included in the message will be used as attributes and 
restrictions in the fitness function to decide the efficient path, 
using the fitness function in equation 2. 

Fitness 

We need fitness to select and evaluate the parent and child 
to know what the best path for the next generation and to 
exclude the worst one. Fitness function will depend on the 
count of hops, delay, bandwidth chromosome, and Jitter delay. 
The most common parameters that used in the fitness function 
are path number, hop number, delay, Jitter delay, bandwidth, 
and efficient path, which denoted by /, P, C, RC, dp, and Ip, 
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respectively. Efficient path value (Ip) is set if all constraints 

are achieved, otherwise it is set to 1 . 

For more efficient, we will give every constraint weight 

percentage according to the most important constraint. For 

example, constraint 1, constraint 2, constraint 3, constraint 4 

are given 75%, 50%, 25%, and 5% of weight percentage 

respectively. 

Depending on the number of constraints, we can calculate the 

value F in the next equation: 

F = Max (round (L (count(P) + (constraint value ^constraint 
weight)) + (constraint value * constraint weight) - ((dp/I) * 

100))) (3) 

The value of F from the equation 3 will be used to select the 
maximum fitness value as the best solution. 

IV. Summary 

The proposed method based on the Genetic Algorithm and 
Tabu search algorithm. GA is used to find the faceable paths 
by using equation 1 and equation 2 and get the best path 
according to the number of constraints that is concentrated on. 
There are some constraints are more important and better to 
satisfy than others. By using Tabu algorithm with a given 
weight percentage for each constraint to evaluate the fitness 
function (equation 3), we can get the efficient paths with 
mixed multi constraints. 
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Abstract-Honeypots provide a system that can lure the 
attackers and hackers and response to various security 
frameworks to control the globe and its environment and 
examine and analysis network activities. We try to employ 
and develop a honeypot framework to propose a hybrid 
approach that improves the current security. 

In this paper, we proposed hybrid honeypots based network 
assuming initiative and enterprise security scheme strategies. 
The proposed model has more advantages that can response 
accurately and swiftly to unknown attacks and lifetime safer 
for the network security. 

Keywords-Intrusion Detection System; User Datagram 
Protocol; Simple Mail Transfer Protocol; De-Militarized Zone; 
Secure Shell; Secure Sockets Layer; Internet Protocol 
Security; Network Traffic Monitoring; Network Address 
Translation; Dynamic Host Configuration Protocol 



I. 



Introduction 



A honeypot can be implemented in network security 
to discover latest assail actions that might not detect by 
Intrusion Detection Systems or network firewalls 
conformity with the old static defense rule system. It is 
important to take into account of the enterprise defense 
rules to go through the honeypot when IDS (Intrusion 
Detection Systems) and Firewall are designed. 

Computer networks are well vulnerable to different 
exploit that can make network unsecure or comprise their 
signify operation. Intruders and attackers have become 
provoke rapidly on security of networks and their 
challenges. To have a better and improved security, 
enterprises, organization and more important finance 
departments have an essay solution to implement various 
hardware and software for network security providers 
such as firewalls, variant of the intrusions detector[18], 
Virtual Private Networks. However, these solutions act 
without interruption to depart from proprietary 
information approachable by deciding intruders, and 
ensue to warn approaches while new attacks take a place. 

II. Background 

Since 2001, the prevalence of Internet worms and 
inoculation serious damage in tens of millions of 
computers around the world and aimed at damaging the 
system hundreds of thousands of individuals and 
organizations was initiated. Code Red worm [2], the 



prevalence of this type of injury for the first time as the 
Internet was born and today after the Morris Worm [15] 
[5], in 1988 that led to the compromised Internet hosts and 
360 thousands vulnerable server and deliver the web 
service attacks and distributed launched on the 
administration of web servers, various types of worms 
have born. Blaster worm [25], which was among the very 
destructive worms, which its incidence could use a service 
running to millions of personal computers and damage 
easily put them to work, was another type. Worm 
Slammer [2], using only UDP (User Datagram Protocol) 
packets and in only 10 minutes of time could cause 
pollution to the population, these worms also can use 
single UDP. 

The Witty [3], using a UDP packet for extensive 
contamination of the infection of the mention. Conceptual 
basis of their defense and technology projects meant to 
defend the attacks to not utilizing, in other words, in order 
to prevent them from attacks already have occurred 
strategy is used. Defend and attack behavior projects 
classified with their common feature and extraction of 
them. We can conduct relevant strategies to prevent to 
these attacks. 

There are various types of intrusion detections with 
different analyze and the detection concepts even to 
monitor the network traffic [18]. However, a few have the 
capability of chasing these intruders by deploying mobile 
agents as well [17]. Implementation of a solo intrusion 
detection system cannot perform as a full mechanism 
attack responder; nevertheless, they are the best immune 
component to trace the incoming intruders. 

Many administrators how are working on security of 
production systems apply honeypots to research the 
network action. In 2002 another honeypot classification 
has been introduced [20], by the level of interaction this 
classification is conduct on honeypot architecture and the 
objective which it has to apply for. A complex honeypot 
can be created to confer the invader entire operating 
system with which to interact. On the contrary, for 
detecting any un-ruled activity like port scanning and 
system explosion a honeypot that merely emulates 
different services in operation can be designed, and try to 
gather the fingerprint of invaders. 
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III. Analysis of Proposed Model 

A. Low Interaction Honeypots 

Low-interaction honeypots in an aggressive expansion 
are simple but can be less work because of simply 
detection by intruders, and with certain commands the 
interaction honeypot emulate can get down. An example 
of a low-interaction [20], honeypot is honeyd. 

Taking the advantages of low interaction honeypot 
provides limited interaction with invaders to let them 
emulate with services. The intention of this type of 
honeypot is to collect data of a first step of assail, and data 
about the threat's motivation is rarely captured, and it is 
because of low level of interaction and effectively system 
compromise. 



A virtual honeypot software process requires having 
an IP address. Multiple virtual honeypots typically use 
several IP addresses and network interfaces to share a 
single run. Hence, the virtual honeypot setup on one 
physical machine as network address translation runs on a 
firewall or in other ways. Most high -interaction honeypots 
allow completely compromised the production system 
while the low-interaction honeypots emulate virtual 
because of their ability is limited. 

Honeyd important work is to provide warnings, which 
most of them are right and real attack alert. By default, 
honeyds can detect any activity on any User Datagram 
Protocol (UDP) port or Transmission Control Protocol 
(TCP), and also writes some of the activities in ICMP 
(Internet Control Message Protocol). Besides, they can 
deceive the attacker through its ability to simulate factors 
that are used. The system response packets are suitable for 
fingerprinting, which by implementing a tool like Nmap 
that can point to run scan network packets. A honeyd' s 
attacker also interacts with services, such as Telnet, FTP, 
HTTP, POP3, SMTP (Simple Mail Transfer Protocol) 
server named. Moreover, they can have backdoors for 
viruses, including the viruses that can be pointed Kuang2 
and Mydoom likes. 

B. High Interaction Honeypots 

In this paper, we deploy honeynet with developing the 
variety of tools to support our research for deploying and 
examining suspicious network traffic. In our particular 
design, we provide a web interface to monitor the 
information gathering and also in backend a firewall to 
control outgoing connection from potentially comprised 
honeypot. Implementing a high interaction honeypot host 
is a cost effective procedure which mostly in mid range 
scale organization, they used virtual environment to 
approach the advantage of easier to monitor and safe and 
clean successful compromise. Various virtual machine 
solutions to this environment are virtual PC [22], virtual 
box [23], XEN [19], VM ware [24], user mode Linux [9]. 

In approach to have high interaction honeypot to grant 
a real network information gathering and facing different 
scans, buffer over flows and various analyses, we 
associate with a few real machines to support our 
production server and collaboration with low interaction 
honeypot zone to reach the bases and real experiment 
result. 
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Many recent research studies to explore the 
deployment of honeypots to enhance network security has 
been done, and it could be named between [4] [10] [11] 
[12] [13] [21] [26] [28] [29]. In Weiler proposed [26], 
honeypots are assigned as a shield in the network, 
whereby all incoming traffic that is imported directed to 
them. After that about disconnection of that connection or 
legally allowed to connect is given. This solution may not 
work as an ideal, because honeypots employ to attract 
attackers and being destroyed and not as prevent or 
defense mechanism to serve. Teo [21], give another 
solution framework called Japonica, which has presented 
the main target of early and rapid response to unknown 
attacks through dynamic orchestration in detection, 
prevention, and reaction mechanisms to particular attacks. 
However, always wrong false alarm probability is a very 
important issue and until the person directly and 
professionally tries to access production services instead 
of Honeypots attack. 



To conclude these methods we can mention that many 
of the above proposed used honeypot as a defense 
mechanism to block the attacker from attacking the 
network. In this paper, that provided the hybrid honeypot 
proposed architecture with having of both low-interaction 
and high-interaction honeypots and provide a framework 
to not blocking or defensive system but be as interactive 
and a lure design with minimization of the traditional 
mistakes. 

C. Hybrid Honeypots 

The call for assembled details assailed processes on 
number of IP domiciles urged researcher of this topic and 
network security providers to pursue more intelligent and 
scalable architectures. These research guides into the large 
scale category architecture which called hybrid honeypot 
architecture. 

IV. Approached Model 

A. Worms Activity 

In a network view, a worm can be a software or 
program that due to run on a honeypot can intention other 
honeypots to modify administration sufficiently which 
they start to make a link and generate connection or pair 
connection requests. This delimitation helps to have a 
method to distinguish and infection, which takes place 
non self distributing network action from self spreading, 
that take system down and configure by its particular 
code. However, it doesn't intention to automatically 
continue the method. Almost all types of worms have their 
own executable codes, which indicate that the captured 
worms have multiple links and had system buffer 
overflow or password generation from their viable. Even 
though most of these viable or executables have a 
nickname which is contributed mostly directly with them, 
and because they are available as files by the worms initial 
utilize. The following Table I. give us the various worm's 
model and shown the number of captured on our 
particular network. 

The proposed work offers the best architecture that 
most focus on the decoy the best lure architecture which 
absorbed by internal network attacks through the hybrid 
honeypot which able to capture and record all the 
incoming and existing data and provide us the data 
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control. In proposed honeynet captures all the activities comment's requests outgoing to the DMZ. Machines in 

and operations of intruders, and send them to the log for DMZ reply, try to forward or reissue queries outside the 

the further application. internet or public network. 



B. Data Analyzing Module 

Data analyzing module does analysis of the collected 
data from original data. The honeynet record data through 
internal honeypots and forward them for analyzing. In 
this between also we are using an appropriate firewall 
which to get more information about captured data, 
furthermore we direct the firewall logs to our analyzer. 

In the proposed architecture, the use, a firewall module 
to work as a logger to capture all the traffic and their 
situation in our back end design, which provide the 
accessibility of our production systems. 

C. Honeynet Activity 

As previously mentioned, the honeynet has two main 
activities, which are information control and information 
seizure or data recording. The primary idea of information 
control is to foreclose invaders abusing the honeynet 
feature to direct them to access the other host. Information 
seizure is to capture all the functionality of invaders. It is 
arduous to gather information as still as imaginable, 
nevertheless, not to be recognize by intruders. 

Most of the invaders try to spread out to encipher 
channels like SSL (Secure Sockets Layer), IPSec, SSH 
(Secure Shell) and other related channel. In such 
activities, the encryption must be accomplished with a 
particular account by the data collector mechanism. In 
addition to this matter, we employ seizer tools with this 
similar functionality on the honeypot to reach a multi 
record level way of recording [1]. In this way not only 
may connect the various intruders' activity steps together, 
on the contrary, as well can keep the way from the default 
of a single mechanism. 

Logs, the information which recorded and system 
activity recorded by tools in honeypot are transfer to 
analyzing module. The information is saved as obtain 
information consistent with the feature of network 
connection and its contents. The recorded information by 
honeynet has less amount size, on the contrary, with more 
fidelity and fatal. 

By taking the beneficent of virtual technology, which 
also use in honeynet, we have the ability to set up the 
virtual honeypot [14], on a host. This plan helps to deduct 
and minimize the cost development of the honeynet. 
Nevertheless, the performance needed to deploy of a host 
is still higher. 

D. De -Militarized Zone 

De-Militarized Zone (DMZ) is not network hardware 
device affection a router or a bridge [8], so it does not 
pass through altered packets. De-Militarized-Zone is 
designed to provide secure communication with servers 
before packets entering to a firewall without needing any 
inbound firewall gapes between the internal LAN or 
network and the deployed DMZ. 

The policy establishes facts security needs for 
networks and the machines and peripherals employed 
within the DMZ. The traditional De-Militarized-Zones 
admit machines which located behind the firewall to 



Many DMZ employs in the event to utilize a server 
(such as proxy server) or other servers as the machines 
deployed within the DMZ. The deployed firewall in after 
trying to prevent the machines situated in DMZ from 
initiating inbound requests. For the DMZ configuration, 
most of the machines conducted on the internal network 
or in a typical LAN run behind the firewall which through 
that they are able to connect to an external network or the 
internet. To deploying the secure zone a few machines or 
servers as well employed outside the firewall in the DMZ, 
those machines on the external part intercept traffic and 
agent queries for other parts of network, and they provide 
an extra layer of protection for the behind firewall zone 
machines. 

A DMZ most often includes servers which provide 
various services to the clients from the internet. These 
services are included FTP, for e-mail services, SMTP, 
IMAP4 and POP3, and also DNS server. Even though 
these servers must be direct to limited access from the 
internet, and besides, they could protect the firewall as 
well. Here we indicate that the servers and honeypots 
reside could be the DMZ or inside the network, however 
DMZ is suggested. The best structure we are looking for 
that has been shown in Fig. 1. 

E. Proposed Hybrid Honeypot Framework 

The proposed advance introduces a pliable honeypot 
based network security system that adopted to alter, in 
particular, organizational, financial and important 
conducted server zone network based on the energetic 
dynamic implementation and configuration of hybrid 
honeypots. 

The primary concept is for the low interaction 
honeypots is to conduct using free ready unused IP 
addresses which available through operating systems or 
distributed ones and their services. They imitate 
simulation of the distributed operating systems and their 
services of the deployed production hosts in a particular 
network. In the mass of cases the going network traffic to 
honeyds will be directed to high interaction honeypot 
where attackers face with certain services. The 
deployment of the half-breed or hybrid in order approach 
the technology of honeypot in two main categorizes: 

Employing minimum administrative interferes on 
account of the number of honeyds and their particular 
service setups automatically based on the authority of the 
network. Focusing on the needed of the honeynets or high 
interaction honeypots in the network by the redirection of 
traffic scenario from the low interaction honeypot shows 
the affection of honyds as real systems to attackers. 

F. Proposed Honeynet 

By the availability of fake machines in the network, 
firstly, the system administrator requires assigning the IP 
addresses of the physical honeypot or essential host in the 
honeynet, then authorized traffic redirection from low 
interaction honeypots and log the activities of attackers. 
The locution redirection does not intend to simply change 
communication direction from different machines. 
However, rather, it pertained reformatting the entering 
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network packets predetermine to a particular honeyd and 
returning them back on the network. By deploying this 
way they have the ability to discover their means to the 
real honeypot. Afterwards, replies to the interloper and 
next gives the delusion to him which invader is engaging 
with a certain real machine. 

We try to show an example of typical Local Area 
Network conduct for our approach to a hybrid honeypot. 
Fig. 1 illustrates the deployment and shows the position. 
This figure illustrates a low interaction honeypot server 
which is directly connected to the main switch, and it is 
with other production systems. It is as well showing the 
physical honeypot in the architecture honeynet which are 
ready to receive network direct traffic or redirected 
coming through low interaction honeypot. As it has been 
shown in the architecture, the low interaction honeypots 
machines are seemed like a physical or production system 
but in actual purpose they are just born as virtual 
machines in advanced. 

In our architecture, we may use Network Address 
Translation (NAT). This employ aim to not requiring of 
reconfiguring each honeypot to be dynamically in internal 
domicile for external domiciles, which come through 
NTM (Network Traffic Monitoring). Accordingly, we 
should mention by setting up the honeypot to support 
dynamic address reconfiguration, we can skip this part 
also. 

The low interaction honeypot server which has been 
illustrating in this figure, which has three main 
functionalities, that imply in different threads. The first 
honeyd server interfaces network scanning application to 
gain information near by the various available operating 
systems in the network, their particular direct or 
administer ports and their running services and finally 
collect and save these data in the file. The next thread 
receives the above mentioned data from the file and 
adjusts the require configuration of low interaction 
honeypots. Hence, it includes the operating system, their 
services, and port and network assistance distribution in 
the real network part. The last thread analyzes the low 
interaction honeypot log traffic data and save it in a 
particular file. Furthermore, the servers wait for arriving 
traffic, which going to unused IP addresses and finally 
presume to identity those IPs while invaders are engaged. 

To construct the proposed system which is shown in 
Fig. 1, a programming language, network scanning tools, 
and operating system need to be chosen. Even though the 
approach architecture framework in general advanced and 
is not limited to a particular preference. The operating 
system which has been selected for the honeyd server 
needed to conduct open source due to the suppleness it 
availability to deploy the security application. For this 
purpose, we conducted Linux Fedora 12.0 version that has 
the require feasibility due to our framework. For the 
programming language, the functionality for the network 
library availability language and its skill of simply 
integrating Fedora tools was needed. For such cases, we 
select Python language, which gives us the needed library 
of avail networking. 
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Figure 1 . Hybrid honeypot architecture. 

In the next step, a network scanning tool needed to 
resolve the sort of operating system conducted in the 
network and the various ports in the production network 
and provide such required information. 

The Nmap for a particular purpose was chosen to be a 
part of our experiment. This network tool can be 
conducted in two dissimilar active modes to gather data 
about various available distributed operating systems as 
well as conduct ports and assumed operating services of 
them in the network. These two folded are, normal mode 
that this tool gathered the information in the precise time. 
In this mode Nmap tries by parallelizing the port scans, 
even though information can be collected in minimum 
time in this case, on the contrary, the server might be 
overloaded with input or output information, and the 
network traffic might expand accordingly. The second 
mode of Nmap tool is the polite mode that gathers the 
information with time consuming. In this mode, the tool 
serializing port scans with hesitating among sequential 
scans. This case is applicable to the machine and the 
network amicable at time consumption and taking an 
extensive time to finish the scans. Even so, we will have 
an accurate scan over the network. 

As you can see in Fig. 2, the process of Nmap scans is 
sending a ping to establish all the devices on the network 
and gather their IP addresses but not permanently in a 
particular file. This file may be used to next scan which 
will be operating system or port scans for given located IP 
addresses. The out coming of the scans logged into the 
particular file which generally now a day using the 
property of XML file that is analyzed every time until 
scan is finalized. Once the tool scans finished and 
stopped, an analyzer is starting and conduct in a thread to 
extract the collected data from the file which 
automatically build a profile to store these data. 

G. Deployment of Honey ds 

The elementary opinions to propose the hybrid 
honeypot is to create employ of unused IP addresses, 
nevertheless, there is a task which helps to solve how to 
separate them amid the running operating system and 
accordingly minimize the likelihood of revealing the real 
and production host in the network and let them to be 
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attacked by intruders. A straight forward advance was 
deployed according to taking near to be a constant 
continuation after including the virtual systems to the 
production system by distribution of the operating system, 
and it should be while extinguish the physical honeypots. 

V. Evaluation and Experiments 

A. Redirection 



In our architecture, we try to take the beneficiary of 
redirection load. By refering to redirection tasks; the 
method begins by retrieving the IP addresses which are 
redirecting from the file data, which is based on ports 
number and operating system employed by the honeyds. 
Each product honeypot is adjusted with a supreme number 
of potential configurable connections which is situated at 
deciding the maximum limit number of honeyds which 
able to redirect. Afterwards, the honeyds ports and 
productive honeypots which have the access for 
redirection to assign connections are compared. However, 
the number of honeypot applicable to give the productive 
honeypot system which runs the operating system is upon 
to the limitation connection number which may be 
handled by the system. 

For example, in redirection of a TCP port number 139 
of the low interaction honeypot which refers to IP 
domicile 172.16.16.7 to the productive honeypot and to 
the same port as well that the physical honeypot is 
working with IP domicile 172.16.16.77 there will be 
requiring to use set command and release that by the 
honeyd to emulate hosts and guide traffic to a certain 
productive honeypot. 

set Compl72. 16.16.7 port 139 tcp 

proxy: 172.16.16.77: 139 

Additionally, the conductive traffic network, monitors 
and loges the honeynet traffics, lastly in the sort kind. The 
specified imitation of the system abilities shows actively 
while the system's skill to redirect network traffic and log 
the information and conceder for relative warning. The 
distinguished property of the propose architecture is, it's 
competent to divert the vindictive traffic from the 
production host. It is considerable which the protection 
ranking is depended upon the availability of free IP 
domicile and also honeyds. However, this protection 
extremely increases as operate of unused IP domicile and 
therefore, can much have the influence to deduct the 
deficient against to the production systems. 
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Let us here to clarify the ability of redirection to 
another honeypot, which means by, we are able to directly 
fined a worm's self disseminate activity which in a first 
touch the advance new target honeypot conducted the 
activity of the network. Nevertheless, there are the number 
of malicious, which are infecting in several level of the 
vulnerable process which the first infection neglect to 
fully setup, so in this case, they will fail to indicate their 
self broadcasting ability, and if we don't admit some 
explorative out going connection. Since any honeypot, we 
may detect later to pass on the connection which would 
not have any information of the exact serial numbers, that 
cannot provide the proxy for the unset of the connection. 
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In the proposed architecture that tryed to take the 
beneficiary of redirection load. By refering to redirection 
tasks; the method begins by retrieving the IP addresses 
which are redirecting from the file data, which is based on 
ports number and operating system employed by the 
honeyds. Each product honeypot is adjusted with a 
supreme number of potential configurable connections 
which is situated at decide the maximum limit number of 
honeyds which able to redirect. 

Afterwards, the honeyds ports and productive 
honeypots which have the access for redirection to assign 
connections are compared. However, the number of 
honeypot applicable to give a productive honeypot system 
which runs the operating system is upon to the limitation 
connection number which may be handled by the system. 

In another condition, the destination request is 
addressing of Dynamic Host Configuration Protocol 
(DHCP), in such cases the request diverted it to the DNS 
honeypot or either DHCP. As earlier we have mentioned 
always in case of availability of a new born honeypot we 
send the outgoing request to them, but we should consider 
that in this level for detecting self issuing behavior is 
important and saturation or either capacity of employed 
honeypot is considerable. 

In other cases the outgoing request always should 
deny, and the established should drop. By implementing 
these conditions, we faced several apartness activities: A 
vulnerable activity was detected by observing the checked 
logs of the honeyds. The hacker used an initiate address of 
172.16.16.105 that was known to us, and it was one of our 
systems, which was not used and under task at the time, so 
that IP address was spoofed and implement to cover the 
origin. 

B. Free IP 

When the network scans tool initiate on the network, 
server provides data regarding the network IP addresses 
and deployed operating systems. As Fig. 3 shows the 
distributed of IP domicile between the unused IP domicile 
and production systems, this figure advocates that the 
network is capable of an epochal number of virtual 
systems to conduct. 



Figure 2. Functionality of Low and High Interaction Honeypots. 
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■ Unused Ips ■ Production Systems 



Figure 3 . The distributed of IP domicile between the unused IP 
domicile and production systems. 

C. Protection of Free IP 

By considering the free IP domiciles on the network, it 
means that by undertaking rare free IP addresses the 
system can be much protected to the production system. 
For this case, there is a condition that if the number of IP 
domiciles is equal to production host, the likelihood of an 
invader aggress a production host is minimum 50% if no 
hybrid honeypot conducted there. 

D. Nmap 

The Nmap tool is to impact on the network for the idea 
of conducting a tool on a particular network to measure 
and garnering component of the system [27]. The 
measuring of Nmap is conducted how; in the first step, it 
is starting the scanning through the normal option which 
on the various operating systems it may take different 
time consumption. As we have tested in windows XP SP2 
in our testing machine, we got the average of 3 to 4 
seconds which in the conducted Linux Fedora we got the 
result of 49 seconds. 

In the first normal testing, multiple threads run to get 
the information of aggregate ports in the same time. Here 
we should mention that the analysis and evaluation of the 
machine to machine are varied, and it depends on the 
communication situation over the network and the status 
of traffic and system position, which some times in case 
of an amount of entering in traffic tools need to re- 
communicate with that particular system. 

To shun extravagant output or input traffic and for 
reducing the time consumption and scanning the network 
the scans were divided into the parallel run until every 
establish get executes. Consequently, the system tries to 
the next step which starts measuring with the polite mode. 
In the polite mode as earlier we discussed, the tool 
conducted serialized the independent port under scan for 
every IP domicile. Here, two cases are come out for the 
measuring of the time which are time consumption for 
each port and retard between finalizing a scan and starting 
the next one. 

In the polite mode, the time consumption is so high 
and evens the retard between scans also is approximately 
near to 480 milliseconds, so our honeypot system and over 
the network, we face the overhead. To give the solution 
for this particular case it needs to allot the various IP 
domicile to aggregate threads so the system is able to get 
run different IP scans by the aim of parallel perform. So 
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conducting the scan in polite mode let the system 
consumption much less time in average. In comparing to 
windows based machine Linux based system have the 
more average respectively. 

E. Experiment Conditions 

In our experiment work, different policy in order can 
be implementing and conditionally used, such as: 

If an outgoing request to the remote area transcends, 
prepared honeypot has the limitation, and it will be 
dropped. This feature of general safety harness which is 
independent of complement safety. 

If an outbound DNS request source is from the 
Domain Name Service or DHCP servers which had the 
honeypot, it may give the permission to connect. We set 
our honeypots address of DNS or DHCP server which 
auto configured itself and runs. We permit a node to 
communicate to DNS requests outbound. 

If an outgoing request is for DNS and the source is the 
standard or normal honeypot request, divert it to the DNS 
honeypot or DHCP and In addition, a request like this is 
questionable, since they must first send through DNS 
honeypot or DHCP considering how to configure them. 

If there is an outgoing service from a source honeypot 
which has not penetrated, this request should be 
abounded. Here, another condition is if a honeypot try to 
make an activity like be updates automatically, we simply 
prevent this activity. This causes as our network accuracy 
behavior of the honeypot. 

If the requests for out-bounding direct the address of 
origin that can initial a honeypot, the request is passed on. 
In such a condition usually let communication initiate to 
the origin of the attack to permit it multilevel vulnerable 
and interfere to the honeypot, so this type is always 
validated. 

F. Logs Assessment 

The logs which collected by honeyd provided a glance 
of possible attacks after revealing un implement 
connections IPs were being made as a consequence of a 
random scan. It has been revealed that the vulnerable tried 
by employing Nmap. It has been discovered that a system 
with the IP of 172.16.16.225 tried to connect to a honeyd 
with different IP, which was 172.16.16.123 and through 
the ports 135, 139, and 445. For this case, the network 
administrator warns that, and we consequently discovered 
that the origin system was a system which infected by the 
Natche virus to make this type of interconnection. 

The analyzing after around two weeks on different 
traffics on the network has stored during honeypots 
operation. We discover a large activity in the background. 
Fig. 4 and Fig. 5 highlight the TCP, UDP, FTP and other 
port scans per day which recorded and provided by NTM. 
The total IP observation of the conducted machines before 
employing any honeypot has shown in Fig. 4. 

To prior conduction of our security architecture, the 
wide ranges of IP scan happened on the various machines 
which mean by any port scans through any internal or 
external IP scanner simply can get placed. Let is here look 
into to post implementation of our proposal model. As 
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Fig. 5 shown, in case of port and IP address activity of the 
machines, we have much less activity of port auditing. 

The success of proposed architecture in best recorded 
case shows less than 6 ports auditing in total which in a 
same day we had around 24 activities. In worth case of 
conduction, the Fig. 5 shows, the 96 ports activity which 
on the same day report in Fig. 4 we have 488 ports 
scanning. 



TABLE I. Captured worms/Trojans/Intruders activity 

WITH A FEW SYSTEMATIC REPORTS BY OUR PROPOSED ARCHITECTURE 
WHILE THE NAMES EXTRACT FROM PANDA ANTIVIRUS 
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Figure 4. Various IP ports scans activity, before implementation of our 
proposed architecture. 



Figure 5. Various IP ports scans activity, after implementation of our 
proposed architecture. 

As engaging a hacker takes time, and as we have 
mentioned before, in such conditions the system needs to 
restart the honeypot and a new born after this process, the 
honeypot is always filled up. Checking the number of 
honeypots needed in unemployed of further filtering, we 
try to do trace to analysis using different meaning of 
honeypot cycles. So as the life cycle contents of the attack 
engagement time additionally the restart time as well. We 
have the different life cycle between seven seconds and up 
to approximately four minutes, and here we should add 
that virtual machine's conductor has different 
performance to recreate and on some may take only a few 
seconds [16] (while to restart the machine totally, may 
consume a full minute). 

Here to clarifying the Table I. and to identify the 
different worms which infected, we imported the 
executables malicious toward a virtual machine on which 
we previously conduct Panda Antivirus [7] and while 
identifying of all is really incomplete perform because 
some names reveal like unknown one, and since we don't 
have good access to a large worm database we brought 
and mention those unidentified completeness names as 
well. 

Our recorded is not only for buffer-overflow worms 
but also is included the weakness of passwords as well. 
Almost the worm attacks needed multiple 
connections/interconnection to complete their loop and 
make a life-cycle , for example, as our system recorded, 
we discovered 72 for BAT. Boohoo. Worm , which listed 
in the table and tried to make a data transmission channel 
differentiating it from the connection/ interconnection to 
send the executable worms to infect the system. This case 
and its behavior shows the debilitated of employing 
techniques to filter known attacks from running system 
background. Nevertheless, in same condition we have 
discovered a large number of known worms and exploit 
which after their first activity in our lure machine got 
unveiled. 

Table I. also notes us minimum detection time for each 
of these worms. For this purpose, our measurement 
detection time is as between the time which the system 
received the initial scanned packet at the honeypot, to 
when next honeypot get infected and redirect and try to 
create an outgoing connection attempt, can be a proof for 
that we have recorded codes of self broadcastings. 

This Detection consumes time is reckoned assuming 
of various factors, which could be the delay of final host 
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response, stability of network, execution time for each 
worm within the honeypot, and availability of honeypot 
which the consumption time calculation is not a matter of 
this paper. 

VI. Scalability and Functionality 

High integrity: Design and the proposed system should 
be a fully functional and accurate server for known 
vulnerabilities and new attack is diagnosed. 

Scalability: The system should be able at least possible 
to analyze the actual scanning probe and pay a lot of 
URLs. 

Isolation: The network communication system must 
separately establish each honeypot. 

Precise control: The system must somehow be designed to 
host a possible external attack honeypots exist. 

Wide coverage: The system must be able to honeypots 
settings with a variety of models and operating systems to 
implement their services. 

This structure allows us those honeypot systems to 
remain independent. This feature is desirable to cover the 
area and prevent the honeypot discovered by attackers and 
to ensure visibility to a wide range of the Internet 
addresses space, as in previous work exploring major 
changes in different places can be seen the network is 
shown [6]. 

Due to architecture scalability, the system can be 
spread out via including physical machines (Production 
machines), production honeypots and honeyds that can 
prepare a stronger protection and security level, which 
carried out by decreasing the likelihood of vulnerability to 
production systems. Another matter which could be 
considered is our architecture structural. 



Our proposed system will consider deploying in 
various subnets in order for serious and greater protection 
of all the subnets. Now we have the resetting terms that 
the scans permit the system to reconfigure the different 
distributed operating systems and their different services 
that the honeyds imitate to demonstrate the contemporary 
merge in the production network. 

So, we have the advantage of this architecture that it 
does not require to moderate, this is due to be the ability 
to discovering immoderate activities of network, the 
compression between various security defense models and 
our proposed model is shown in Table II. Another issuing, 
which previously also mentioned is loading balancing of 
the current architecture, and it will be getting done our 
proposed redirection ability. The physical (production) 
honeypots receive redirections coming from the honeyds 
which is based on the operating system, and they try to 
run and give the distribution operating system between the 
honeyds moderate which in the production network, it is 
anticipated that the honeypots will be loaded by 
considering and assuming the distribution of the operating 
systems in the particular network. 
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table ii. comparission of various security defense 
Frameworks 
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VII. Conclusions 

The proposed hybrid honeypot architecture system 
provides a partial protection to the production systems. It 
fulfills this by decreasing the likelihood of activity of the 
hacker and is targeting our production systems by 
employing the lure systems in the network which the 
hacker cannot come to know about these systems, their 
status and his fingerprint and consider the fake system as 
real systems. This cannot complete our goal without 
employing the redirection capability, and the production 
system will remain vulnerable to attack for direct assail 
that do not pass through the conducted honeypot system. 

In the proposed design, the production honeypots can 
play only as a passive duty in which they only can log 
different activities of the attackers, so the system 
administrator can extract and analyzed them due to data 
mining. This could play a more active role by analyzing 
the attacker's activities and decreasing the different 
attack's type by use of signatures file or a signature 
database which has the capability of the development and 
mine the data. As we have shown, the honeypots will be 
an ability of adding and releasing the warnings, and they 
can send notice to the administrator, the intruder type and 
various feasible suggestions to block the attack 
propagation. 
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Abstract — The important feature of the Adaptive Cruise 
Control (ACC) system is the ability to maintain a proper 
inter-vehicle gap based on the speed of leading vehicle and 
the desired distance. Adaptive Cruise Control operates in 
two modes (i) Velocity Control mode, (ii) Distance Control 
mode. ACC acts like a conventional Cruise Controller 
(CC) under velocity control mode. In the case of the 
distance control mode, the speed of the host vehicle is 
reduced according to the surrounding environment to 
maintain the safe distance between the leading vehicle and 
the host vehicle. 25 rules have been used in Fuzzy logic 
Controller (FLC) with the knowledge base of the system. 
The inputs of the FLC are distance error and the speed 
error. The host vehicle adapts to the lead vehicle speed 
changes and tries to maintain a proper distance between 
them. The performance of the FLC based ACC is 
improved by Genetic Algorithm to tune the fuzzy rule 
base. Genetic Programming is used to select the best rule 
out of the 25 for a corresponding input. The result showed 
a better improvement over the Fuzzy Controlled System. 

Keywords - Adaptive Cruise Control; Genetic Algorithm; Fuzzy 
Logic Control 



I. 



Introduction 



Researches on Intelligent Vehicle (IV) Systems have been 
devoted to solve problem such as driver burden reduction, 
accident prevention, traffic flow smoothing. Mentally, driving 
is a highly demanding activity - a driver must maintain a high 
level of concentration for long periods and be ready to react 
within a split second to changing situations. In particular, 
drivers must constantly assess the distance and relative speed 
of vehicles in front and adjust their own speed accordingly. A 
Cruise Control (CC) system has been developed to assist the 
driver for driving in long distance on highway when there is 
no vehicle present before the host vehicle. Adaptive Cruise 
Control (ACC) supports the driver in longitudinal control of 
vehicles by operating in two modes of control, (i.e.,) Velocity 



Control mode and Distance Control mode. In Velocity Control 
mode ACC maintains the vehicle's preset velocity set by the 
driver. The stability of the ACC system is disturbed when a 
lead vehicle or an obstacle is present in the way of the vehicle 
fitted with ACC. Such a drawback is rectified by switching 
over to Distance Control. In this mode ACC automatically 
adjusts the host vehicle velocity in order to maintain a safe 
distance between the two vehicles. These systems are 
characterized by a moderately low level of throttle and brake 
authority. The limitation of conventional ACC systems is that 
they do not manage speeds under 30 km/h and, consequently, 
are not useful in traffic jams or urban driving, situation. At 
congested traffic, the ACC system becomes less useful. Now, 
ACC systems are made capable of maintaining controlled 
vehicle's position relative to the leading vehicle even in 
congested traffic by using stop and go features while 
maintaining a safe distance between leading and following 
vehicles autonomously. The conventional CC system operates 
only in one mode of control i.e., velocity control mode, on the 
other hand, ACC has two both velocity and distance control 
modes. In this paper the different inter vehicle distances and 
speed levels have been considered. Simulation results obtained 
from ACC system using Fuzzy Logic Controller (FLC) and 
genetically tuned FLC have been compared to validate the 
objective of this paper. 



II. FLC BASED ACC 

Fuzzy Logic Controller is designed on the basis of fuzzy 
logic, which does not require any mathematical models but 
mainly depends on the experience. Fuzziness describes event 
ambiguity. It measures the degree to which an event occurs, 
not whether it occurs. Fuzzy theory is a powerful tool in the 
exploration of complex problems because of its ability to 
determine outputs for a given set of inputs without using a 
conventional, mathematical model. Fuzzy theory becomes 
easily understood because it can be made to resemble a high 
level language instead of a mathematical language. To 
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describe a universe of discourse, fuzzy sets with names such 
as "hot" and "cold" are used to create a membership function. 
By determining the degree of membership of an input in the 
fuzzy sets of this membership function, the role of 
membership functions play in decoding the linguistic 
terminology to the values a computer can use [7]. Fuzzy Logic 
controller is represented by a set of rules represented in the 
form of if-then rules [3]. The fuzzy rule consists of antecedent 
and consequent. Antecedent is a condition in its application 
domain and consequent is a control action for the system 
under control. The fuzzy inference engine employs the fuzzy 
knowledge base to simulate human decision making and infer 
fuzzy control actions. Finally, the defuzzifier module is used 
to translate the processed fuzzy data into the crisp data suited 
to real world applications [4]. 

A. Frame work of the Fuzzy Logic Controller 



Input 



Fuzzy knowledge rule base 



Fuzzy Rule base 



Fuzzifi cation 



Inference engine 



Defuzziri cation 



VL 



Output 



Figure. 1. Framework of Fuzzy Logic Controller 

Fuzzify the inputs according to the input membership 
Functions. The rule strength is found out by combining the 
fuzzified inputs according to the Fuzzy rules. The 
consequence of the rule is found out by combining the rule 
strength and the output membership function. The Fuzzified 
output has to defuzzified to convert the Fuzzified value to a 
crisp value. Defuzzifying method is the weighted average of 
all rule outputs[8]. 

B. Inputs of the Fuzzy logic controller 

Two input and a single output Fuzzy logic controller is 
used. The inputs of the Fuzzy Logic Controller are distance 
error (Xerror) and the speed error (S error). The distance error 
(1) is the difference between the actual distance (Inter- vehicle 
Distance, Xactual) and the desired distance (Xdesired). Three 
different distance levels are considered for simulation purpose 
which is shown in Fig. 2. (Fig 2 (a) - distance varies from 7 to 
13.3m, Fig 2 (b) - distance varies from 5 to 6m, Fig 2 (c) - 
distance varies from 2 to 4.4m) The actual distance can be 
measured using an ultrasonic sensor. The desired distance is the 
distance which required to be maintained between the vehicles 
to avoid the rear end collision. Desired distance changes in 
direct proportion to the vehicle speed. 




(a) 




(c) 
Figure 2. Actual Distance 

The Speed error is obtained by the difference 
between the leading vehicle speed (Slead) and the 
host vehicle speed (Shost) according to (4). 



X dgsiTBd = X safg + THW.V h 



THW = ■ 



clearance 



lead vehicle velocity 



(1) 



(2) 



P) 



The velocity of the leading vehicle is found out by sum of the 
host vehicle and the distance error according to (5). 
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c = s — s 

error lead host 



^lead ** host ' * errar*3 £ 



(4) 
(5) 



(IJCSIS) International Journal of Computer Science and Information Security, 

Vol. 9, No. 2, February 2011 
D. Output of FLC Based ACC 



The output of the Fuzzy Controller determines acceleration or 
braking which drives the vehicle. 

C. Fuzzification of Inputs 

Mamdani Fuzzy inference method is used in this case. The 
Fuzzy sets are represented by using the linguistic variables 
namely (i) Negative Medium (NM) (ii) Negative Small (NS) 
(iii) Zero Error (iv) Positive Small (v) Positive Medium [10]. 
Xerror and S error are the two inputs for the Fuzzy Logic 
Controller. The output is the firing on ACC which gives the 
desired braking and acceleration. The input and output of the 
Fuzzy Logic Controller are represented triangular membership 
functions. The output is the acceleration or Braking command 
according to the current input. The positive side of the output 
represents the acceleration command and the negative side 
represents the braking command. 25 rules have been generated 
with the knowledge base of the system. Table 1 gives the 
relation between the input and the fuzzy output 

TABLE.l. FUZZY RULE BASE 
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This system is modeled based upon the equations. The value 
of X desired depends on the Speed of the host vehicle. The 
desired distance varies proportional to the speed of the host 
vehicle. The value of X error is negative when X actual is less 
than the X desired, therefore the Speed of the host vehicle has 
to be reduced. The value of X error is positive when the X 
actual is greater than the X desired; therefore the speed of the 
host vehicle has to be increased. Thus this controls the output 
of the ACC vehicle. The host vehicle is adapted to the lead 
vehicle with minimum error. 
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(c) 
Figure 3. Output of the fuzzy controlled ACC for the given conditions 



The host vehicle is getting adapted to the lead vehicle for 
various inter-vehicle distances considered and shown in Fig 2. 
For the third case the level of adaptation was very poor. 

III. GA Based Fuzzy Controlled ACC 

Genetic Algorithms are computing algorithms to solve 
optimization problems by making use of evolutionary 
principles as known from biology. Evolution is a process that 
operates on chromosomes (organic devices for encoding the 
structure of living beings) rather than on living beings. The 
processes of natural selection cause those chromosomes that 
encode successful structures to reproduce more often than 
those that do not. Recombination processes create different 
chromosomes in offspring by combining genes from the 
chromosomes of the two parents. Mutation may cause the 
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chromosomes of children to be different from those of their 
parents [7]. Genetic algorithms are used to maximize the 
performance of a fuzzy logic controller through the search of a 
rule from a given knowledge base to achieve the goal of 
minimizing the number of rules required.GA will eliminate all 
the unnecessary rules which have no significant contribution 
to improve the system performance [5]. 

A Optimization of Fuzzy rule base 

X actual is the difference between the leading vehicle and 
the host vehicle for which the simulation is done. Xerror and S 
error are the two inputs for the GA based fuzzy controlled 
ACC. Fuzzification is a process which converts the crisp value 
into a fuzzy value. X error and S error are the two inputs 
given. 2 5 rules have been generated with the knowledge base of 
the system. The membership function of the linguistic 
statement is converted to a binary string by assigning a binary 
number [12]. 
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D. Simulation output 

The new generation has been formed after crossover and 
mutation. The generated 25 rules have been reduced to 2 rules 
using the Genetic Algorithm. The surface view of the 
optimized rule is shown in Fig.4 



Negative medium 
Negative small 
Zero error 
Positive small 
Positive medium 



000 
001 
010 
011 
100 



10 random rules are obtained from the fuzzy rule base. The rule 
strength is calculated with respect to the antecedent and 
consequent of the fuzzy rule. The selection of the chromosome 
is done and the GA operators such as Crossover and Mutation 
take place and the next generation is formed 

B. Crossover 

Crossover is a process by which the systematic information 
exchanges between two chromosomes and is implemented by 
using probabilistic decisions. Cross over is done with 
crossover probability. Two parents are randomly selected and 
let the parents be 1 & 3. A random number is generated to split 
the chromosome and to form the next generation. If the 
generated random number is less than the crossover 
probability Crossover has to be done by selecting the 
crossover site randomly by Interchange the bits 
Parent=10000010 
00110100 
Crossover offspring= 10000100 
00110010 

C. Mutation 

Mutation is a process in which the occasional 
alteration of a value at a randomly selected bit 
position.Mutation is done with mutation probability 
(pc=0.6).Two parents are randomly selected and let the 
parent be 3. A random number is generated and if the 
generated random number is less than the mutation 
probability mutation has to be done by selecting the 
mutation site randomly by flipping the bitfll]. 

Parent= 10000 100 
offspring=10000110 




Figure. 4. Surface view of optimized rules 
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Figure. 5. GA based Fuzzy Controller Output for the given scenarios 

IV.CONCLUSION 

Adaptive Cruise Control has been designed using Fuzzy Logic 
Control. 25 rules have been generated with the knowledge 
base of the system. The host vehicle tried to maintain the 
distance so that the Xerror remains almost zero. FLC based 
ACC system developed much error when the distance was less 
than 5m. In order to reduce this error, Genetic Algorithm is 
used to optimize the Fuzzy rule base. Host Vehicle adapts to 
the change in lead vehicle speed more efficiently. 
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Abstract: This paper proposes a fuzzy rule based design 
approach to prevent the Headlight glare emitted by the 
oncoming vehicles on the Highways. This gradually 
reduces accidents on the Highways as the driver of the 
oncoming vehicle is put on a comfortable zone which 
might otherwise blind the oncoming driver's visibility. In 
the conventional vehicles the illumination is adjusted 
manually by the driver. This fuzzy based approach has 
the fuzzy sensor and the controller embedded inside the 
windshield or fit on to it, generates ambient illumination 
to the oncoming driver, there by not ruining the vision of 
the driver during night. This setup has to be embedded 
on to all the vehicles, so that it prevents the happening of 
accidents. Fuzzy sensor and the controller makes use of 
the fuzzy rules. The light intensity emitted by the 
oncoming vehicle received by the fuzzy sensor, is 
fuzzified using triangular membership function and 
checked for the tolerance limit. If not of acceptable limit; 
the fuzzy sensor forwards it to the fuzzy controller which 
converts the light intensity to an ambient light source 
thereby defuzzifying the output. 

Key words- Fuzzy logic; fuzzy rules; fuzzy sensor; fuzzy 
controllers; fuzzification; defuzzification; Headlight glare 

I. INTRODUCTION 

Around the world more than 1.2 million people lose their 
life in Road Accidents, every year. 3 to 4 % of Gross 
National Product is lost in Road Accidents. One person is 
killed in Road Accidents, every three minutes in the World. 
Total number of annual road accident deaths is more than the 
total population of Mauritius. 



Headlight glare is the main challenge, when driving at 
night to the drivers. During night the drivers are affected by 
the dazzling high intensity headlights, which puts off their 
vision and results in accidents. The blinding effect may be 
nearly total, if the lights have not been switched from high 
beam, but even on low beam there is significant discomfort 
and reduced visibility. This paper proposes a Fuzzy based 
approach to reduce the headlight glare. The fuzzy sensor and 
the fuzzy controllers embedded in the windshield during its 
lamination process or fit on to the windshield, gives a 
solution to the headlight glare. The Sensor includes the 
operation of checking the light source, if of over 
tolerance/under tolerance. There by the controller converting 
it in to low intensity if of high intensity and vice versa, 
providing ambient light source. 

The light intensity(I) measured in Volts and the 
distance(D) in metres are received by the fuzzy sensor. The 
input parameters received by the fuzzy sensor are crisp input 
values (Numerical value). These crisp sets are converted in to 
fuzzy sets using the process of fuzzification and are 
evaluated using the fuzzy rules. The output light intensity (01) 
calculated using the fuzzy rules is checked for the tolerance 
limit by the fuzzy sensor. If beyond the tolerance limit, the 
fuzzy sensor defuzzifies using Centroid of Area and then 
sends it to the fuzzy controller which converts it to ambient 
light source. The process of fuzzification and defuzzification 
is also repeated in the fuzzy controller. 

In [1] a fuzzy controller that controls the brake rate of the 
vehicle has been stated. The speed of the vehicle for which 
the brake has to be applied and distance of the vehicle from 
the point at which it has to stop are passed as the input 
parameters to a fuzzifier. The controller compares these 
inputs with the rule base and gives the desired output. 
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In [2] automatic fuzzy controller which controls the 
switching of headlight intensity of automobiles has been 
proposed.. 

[3,5,6,8,9,11,12] gives basic understanding of Crisp set, its 
conversion to Fuzzy sets, concepts of fuzzy controller, and 
the knowledge about Fuzzy Expert system. This paper gives 
the Methodology used in the fuzzy sensor, fuzzification of 
input variables, rule evaluation and defuzzification in section 
II, Implementation in section III and conclusion in section 
IV. 



II. METHODOLOGY 

The fuzzy sensor with its input parameters I(input 
intensity), D(distance) and the output parameter OI(Sensor 
output) is clearly shown in Fig. 1. The figures below indicate 
the demonstrations derived using MATLAB. 



fuzzification of these parameters, linguistic variables are used 
(Table I, II, III). The input Intensity (I) consists of 6 fuzzy 
sets, Distance(D) has 10 fuzzy sets and the output parameter 
ouput Intensity consists of 6 fuzzy sets. 

Table i. Linguistic variables for input Intensity I(V) and their 

NUMERICAL RANGE 



Linguistic value 


Notation 


Numerical 
range 


JustNoticeable 


JN 


0-3.50 


Noticeable 


N 


3.00-6.50 


Satisfactory 
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5.00-8.50 


JustAcceptable 


JA 


7.00-10.50 


Disturbing 
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9.00-12.50 


UnBearable 


UB 


11.00-14.50 
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Fig. 1 . The structure of the fuzzy sensor 



Table II. Linguistic variables for Distance D(mts) and its 
Numerical range 



A. Fuzzy inference process 



Linguistic value 


Notation 


Numerical 
range 


VeryClose 


VC 


0-25 


Close 


CL 


12-50 


VeryNear 


VN 


37-75 


Near 


N 


62-100 


ModeratelyNear 


MN 


87-125 


ModeratelyFar 


MF 


110-150 


Far 


F 


135-175 


VeryFar 


VF 


160-200 


PrettyVeryFar 


PVF 


185-225 


Boundary Zone 


BZ 


210-250 



Fuzzy inference process defines a set of fuzzy "if - then 
"rules. Most fuzzy logic based system uses rule bases to 
represent the relation among the linguistic variables and to 
derive actions from sensor input. 
The fuzzy inference process is performed in four steps: 

1 . Fuzzification of the input variables . 

2. Defining Membership functions. 

3. Rule evaluation. 

4. Defuzzification. 



1) Fuzzification: Fuzzification is the process of 
converting the crisp input variables to fuzzy variables. It is 
the mapping of the range of input to set membership values 
of each fuzzy variable. The crisp values got for the input 
parameters I and D are converted in to fuzzy sets. For 



Table hi. Linguistic variables for Sensor output light source 
oi(v) and its numerical range 



Linguistic value 


Notation 


Numerical 
range 


JustNoticeable 


JN 


0-3.50 


Noticeable 


N 


3.00-6.50 


Satisfactory 


S 


5.00-8.50 


JustAcceptable 


JA 


7.00-10.50 


Disturbing 


D 


9.00-12.50 


UnBearable 


UB 


11.00-14.50 



2) Defining Membership functions: After fuzzification 
is done, the next process is to define the membership 
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functions in the fuzzy sets for the input and output 
parameters. The Triangular membership function is used 
for constructing the fuzzy sets. The membership 
function of the input parameters is shown by the figures 
(2-3). The membership function of the output parameter 
is shown in figure.4. Fuzzy membership expressions for 
the input Intensity(I) and Distance(D) is given by 
(Eq.(l-2)). 
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Fig.2. The membership function of I(inputlntensity) 
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7.00 < x < 8.75 


1.75 


10.50-x 
1.75 ' 


8.75 <x< 10.50 


; 

V 


10.50 <x 


f 0; 


x < 9.00 


x-9 


9.00 <x< 10.75 


1.75 ' 


12.50-x 
1.75 ' 


10.75 <x< 12.50 


; 


12.50 <x 



; 



x< 11.00 
11.00 <x< 12.75 

12.75 <x< 14.50 



14.50 < x 



(1) 
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Fig. 3. The membership function of D(Distance) 



( 0; 
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X-12 



Hcl(*) = 
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|LIvnW 



12 <x<31 



^-; 31<x<50 

19 

; 50 <x 

0; x<37 

^^; 37<x<56 

19 

^^; 56<x<75 

19 

; 75 < x 



|Ll N (x) = 



0; x<62 

X-62 
21 



; 62<x<81 



81<x< 100 



; 100 <x 
0; x<87 

X-87 
19 



87<x< 106 



|Imn(x) — \ i25-x 



106 <x< 125 



|Wmf(xJ 



; 125 <x 
0; x<110 

x-110 



20 



; 110<x<130 



130 <x< 150 



; 150 <x 
0; x<135 

x-135 



20 



|LIf(x) = < 



; 135<x<155 



; 155<x<175 



; 175 <x 



( 0; x<160 



jU w (x) = 



160 <x< 180 



180 <x < 200 



; 200 <x 
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0; x<185 

x-185 



20 



JUpvfOO = < 



; 185<x<205 



205 <x < 225 



|LIbz(x) 



; 225 <x 



0; x<210 



; 210<x<230 



; 230<x<250 



0; 250 <x 



(2) 




Fig. 4. The membership function of OI(Output Intensity of sensor ) 



3) Rule evaluation: The fuzzy input values are processed 
using the set of rules. The rules in fuzzy control consist of a 
condition, IF, followed by a control action, THEN. Each rule 
processes the information using different input parameters; 
the output of each rule is different. In order to construct the 
fuzzy rules we construct rule matrix (Table IV) and rule 
bases. 



Table iv. Rule matrix representation 



D/I 


JN 


N 


S 


JA 


D 


UB 


BZ 


JN 


JN 


JN 


S 


JA 


D 


PVF 


JN 


JN 


N 


S 


D 


D 


VF 


JN 


JN 


N 


S 


D 


D 


F 


JN 


N 


S 


S 


D 


D 


MF 


JN 


N 


S 


JA 


D 


D 


MN 


N 


N 


s 


JA 


D 


D 


N 


S 


S 


s 


JA 


D 


UB 


VN 


S 


S 


s 


D 


D 


UB 


CL 


JA 


JA 


D 


D 


UB 


UB 


VC 


JA 


D 


D 


UB 


UB 


UB 



From the rule matrix we are able to arrive at the rule bases. 
We have 10*6 rules for the fuzzy sensor. The rule base 
consists of antecedent part and the consequent part. 
Antecedent part consists of input linguistic variables that may 
be combined using AND operators. Consequent part contains 
the output of the fuzzy rule. The figure below (Fig. 5) shows 
the rule base for the sensor. In the figure below, the value of 
1=13.9, D=250 and 01=7.25. This implies that the output 
light intensity is moderate; the sensor judges it to be of the 
acceptable limit and it need not send it to the controller. 



S 





r - - - ■ 


- i - 





Fig. 5. Computing the value of OI for 1=13.9 and D=250 

If the output light intensity I is higher(9.00 and above), the 
sensor sends it to the controller and the fuzzy controller (Fig. 
6) converts it into an ambient light source. 
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fc«UA_.^J 




Fig. 6. The structure of the Fuzzy Controller 

The Fuzzy controller accepts the Outputlntensity(OI), if it 
is of either Disturbing(D),or UNBEARABLE(UB) it 
converts it to an ambient light source. The Fuzzy controller's 
ouput is the ControllerOutputlntensity(COI) (Table V) 

Table V. The linguistic variables for COI and its numerical range 



Linguistic value 


Notation 


Numerical 
range 


ReduceLightSource 


RLS 


9.00-14.50 


AmbientLightSource 


ALS 


0-9.00 




COI=7.45 . This implies that the output light intensity is 
moderate; the sensor judges it not to be of the acceptable 
limit and it sends it to the controller. 
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Fig. 8. Computing the value of COI for OI=9.5 



4) Defuzzification: The fuzzy sets are converted to crisp 
values. Here the fuzzy sets represented by OI(Sensors output) 
and ols(fuzzy controllers output) are converted to crisp 
sets(Numerical values). Centre of Area method has been 
used. General formula for CO A is (Eq:3) 
z* = j|Uc(z)zdz 



j|Uc(z)dz 



(3) 



Fig. 7. The membership function for the ControllerOutputlntensity(COI) 

The rule bases for the fuzzy controller(Table VI) is as 
shown below. 

Table VI. The rule bases for the Fuzzy controller 



Rule 


OI 


COI 


1 


JN 


ALS 


2 


N 


ALS 


3 


S 


ALS 


4 


JA 


ALS 


5 


D 


RLS 


6 


UB 


RLS 



The figure below (Fig. 8) shows the rule base for the 
sensor. In the figure below, the value of 01=9.5 and that of 



III. IMPLEMENTATION 

The MATLAB Fuzzy Logic Toolbox has been used to 
encode fuzzy sets, membership functions, fuzzy rules and to 
perform inference process for both the fuzzy sensor and the 
fuzzy controller. 

IV. CONCLUSION 

This paper has proposed a fuzzy rule based approach to 
prevent the headlight glare which in turn minimizes the 
Accidents. The fuzzy sensor and the controller uses the 
fuzzy rule bases to control the intensity of light. The 
conventional controllers would not be very efficient in 
controlling the headlight glare as there would be discrete 
values either high/low beam but the fuzzy controller has the 
continuous light intensities rather than high/low beam. The 
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fuzzy sensor and the controller has to be embedded inside the 
windshield or fit on to it, during the windshield lamination 
process. This fuzzy system comprising the sensor and the 
controller reduces the headlight glare and therefore reduces 
the accidents on the highways during the travel at night. This 
fuzzy system would be of greater boon to the drivers as the 
driving becomes comfortable without ruining the vision of 
the driver at both the ends. This setup comprising the fuzzy 
sensor and the fuzzy controller, has to be put up on all the 
vehicles in order to prevent the happening of accidents. 
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Abstract - In recent years there has been considerable 
interest in analyzing relative trust level of the web objects. 
As the web contain facts and the assumptions on the global 
scale resulting on various criterions for trusting web page. 
In this paper an algorithm is proposed which assigns a 
rank to every web object like a requested document on the 
web that specify the quality of that object or the relative 
level of trust one can make on that web page. It is used for 
object level information extraction for ranking search 
results and is implemented in C++. In this paper the 
behavior of object rank for different values of moister 
factor in a domain is analyzed. The results emphasize that 
the moister factor can be useful in rank computation and 
further explore more web pages in alignment with the 
user's requirements. 

Keywords- Random Surfer Model, Information 
Computing, Web Objects, Information Retrieval System, 
Web Graph, Ranking, Object Rank. 



I. 



INTRODUCTION 



Information computing in various web domains is broadly 
extracting the web objects of unstructured nature like text 
objects that convince information need from within large 
collections using document-level ranking and therefore the 
structured information about real- world objects which is 
embedded in static web pages. Online databases exist on the 
web in huge amounts which are of unstructured nature. 
Unstructured data refers to the data which does not have clear, 
semantically obvious structure [7]. In other words information 
computing constitutes process of searching, recovering, and 
understanding information, from huge amounts of stored data. 
The information from the web can be retrieved by 
implementing searching techniques as Keyword based 
Searching, Concept-based Searching, Hybrid Search, and 
Knowledge Base Search. In case of object level information 
computing, domain based search is required. Every commercial 
information retrieval systems try to facilitate a user's access to 
information that is relevant to his information needs. This 
paper highlights ranking problem for domain based 
information retrieval, which states that every owner of the 
document wants to improve ranking of its document for that it 
can do many manipulations on its document like increasing 
number of links to the page by the dummy pages [1]. Object 
based information computing maintain the integrity of the 



search results based upon various lexicons. As the web 
contains the contradictions and hypothesis on a huge scale, 
therefore finding the relevant information using search 
engines is a tedious job. With the help of object level 
ranking [22], various objects on a domain independent of 
the query that describes the relative trust of the web page 
can be prioritized. The object rank of a page depends upon 
various factors associated with the web object. 

The organization of the paper is as follows. Related 
work is presented in section 2. Section 3 discusses the 
challenges of high quality search results. In section 4, 
Web_Object_Rank algorithm is proposed and discussed. 
The algorithm is implemented in section 5. Finally Section 
6 concludes the paper on the basis of the results obtained. 

II. RELATED WORK 

Google is a prototype of a large-scale search engine 
that makes heavy use of the structure present in hypertext 
[1]. Google is designed to crawl and index the web 
efficiently and produce much more satisfying search 
results than existing systems. Link Analysis Ranking [16] 
emphasize that hyperlink structures are used to determine 
the relative authority of a web page and produce improved 
algorithms for the ranking of search results. The prototype 
with a full text and hyperlink database of web pages is 
available at [8]. In the current era there is much concern in 
using random graph models for the web. The Random 
Surfer model [9] and the Page Rank-based selection model 
[11] are described as two major models [10]. Page Rank- 
based selection model tries to capture the effect that the 
search engines have on the growth of the web by adding 
new links according to Page Rank. The Page Rank 
algorithm is used in the Google search engine [12] for 
ranking search results. PageRank is a link analysis 
algorithm used by the Google Internet search engine that 
assigns a numerical weighting to each element of a 
hyperlinked set of documents, such as the World Wide 
Web (WWW), with the purpose of "measuring" its 
relative importance within the set. Google is designed to 
be a scalable search engine with primary goal to provide 
high quality search results over a rapidly growing WWW 
[18]. The PageRank theory suggests that even an 
imaginary surfer who is randomly clicking on links will 
eventually stop clicking. The probability, at any step, that 
the surfer will continue is a damping factor d [2]. The 
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damping factor (a) is eminently empirical, and in most cases 
the value of a can be taken as 0.85 [1]. Page Rank is the 
stationary state of a Markov chain [2, 7]. The chain is obtained 
by perturbing the transition matrix induced by a web graph 
with a damping factor that spreads uniformly over the rank. 
The behavior of Page Rank with respect to changes in a is 
useful in link-spam detection [3]. The mathematical analysis 
of Page Rank with change in a show that contrary to popular 
belief, for real-world graphs values of a close to 1 do not give 
a more meaningful ranking [2,21]. The order of displayed web 
pages is computed by the search engine Google as the 
PageRank vector, whose entries are the Page Ranks of the web 
pages [4]. The Page Rank vector is the stationary distribution 
of a stochastic matrix, the Google matrix. The Google matrix 
in turn is a convex combination of two stochastic matrices: 
one matrix represents the link structure of the web graph and a 
second, rank-one matrix, mimics the random behavior of web 
surfers and can also be used to fight web spamming. As a 
consequence, Page Rank depend mainly the link structure of 
the web graph, but not on the contents of the web pages. Also 
the Page Rank of the first vertex, the root of the graph, follows 
the power law [10]. However, the power undergoes a phase- 
transition as parameters of the model vary. 

Link-based ranking algorithms rank web pages by using the 
dominant eigenvector of certain matrices—like the co-citation 
matrix or its variations [17]. Distributed page ranking on top of 
structured peer-to-peer networks is needed because the size of 
the web grows at a remarkable speed and centralized page 
ranking is not scalable [5]. 

Page ranking can be propagation rates depending on the 
types of the links and user's specific set of interests [6]. Page 
filtering can be decided based on link types combined with 
some other information relevant to links. For ranking, a profile 
containing a set of ranking rules to be followed in the task can 
be specified to reflect user's specific interests [20]. 
Similarities of contents between hyperlinked pages are useful 
to produce a better global ranking of web pages [19]. 



III. 



CHALLENGES 



The primary focus of Web Information Retrieval Support 
System (WIRSS) is to address the aspects of search that 
consider the specific needs and goals of the individuals 
conducting web searches [15]. The major goal is to provide 
high quality search results over a rapidly growing World Wide 
Web. Google employs a number of techniques to improve 
search quality including page rank, anchor text, and proximity 
information. Decentralized content publishing is the main 
reason for the explosive growth of the web. Corresponding to a 
user query there are many documents that can be retrieve by 
search engine. And every owner of the document wants to 
improve the ranking of its document. Commercial search 
engine have to maintain the integrity of there search results and 
this is one reason for the unavailability of the efforts made by 
them publicly. Democratization of content creation on the web 
generates new challenges in WIRSS. This gives rise to the 
question on integrity of web pages. In a simplistic approach, 
one might argue that only some publishers are trustworthy and 
others not. One more challenge is fast crawling technology is 
needed to gather the web objects and keep them up to date. 



IV. WEB_OBJECT_RANK ALGORITHM AND 

IMPLEMENTATION 

Page Rank of a web object can be defined as the 
fraction of time that the surfer spends on an average on 
that object. The probability that the random surfer visits a 
web page is its Page Rank [1]. Evidently, web objects that 
are hyperlinked by many other pages are visited more 
often. The random surfer gets bored and restarts from 
another random web object with a probability termed as 
the moister factor (m). The probability that the surfer 
follow a randomly chosen outlink is (1-m). 

The Markov Chain is a discrete-time stochastic 
process: a process that occurs in a series of time-steps in 
each of which a random choice is made [7]. There is one 
state corresponding to each web object. Hence, a Markov 
chain consists of TV states if there are N numbers of Web 
Objects in the collection. A Markov chain is characterized 
by an N x N Probability Transition Matrix P each of 
whose entries is in the interval [0, 1]; the entries in each 
row of P add up to 1. Markov Property states that each 
entry Pij is the transition probability that depends only on 
the current state /. A Markov chain's probability 
distribution over its states may be viewed as a Probability 
Vector, a vector all of whose entries are in the interval [0, 
1], and the entries add up to 1. According to [7, 14] the 
problem of computing bounds on the conditional steady- 
state Probability Vector of a subset of states in finite, 
discrete-time Markov chains is considered. 

A. Web_Object_Rank Algorithm: Features 
Features of Object Rank Algorithm are as follow: 

• Query independent algorithm (assigns a value to 
every document independent of query). 

• Content independent Algorithm. 

• Concerns with static quality of a web page. 

• Object Rank value can be computed offline using 
only web graph. 

• Object Rank is based upon the linking structure of 
the whole web. 

• Object Rank does not rank website as a whole but 
it is determined for each web page individually. 

• Object Rank of web pages Tj which link to page A 
does not influence the rank of page A uniformly. 

• More are the outbound links on a page T, less will 
page A benefit from a link to it. 

• Object Rank is a model of user's behavior. 

B. Web_Object_Rank Algorithm: Assumptions 

If there are multiple links between two web objects, 
only a single edge is placed. 

• No self loops allowed. 

• The edges could be weighted, but we assume that 
no weight is assigned to edges in the graph. 

• Links within the same web site are removed. 

• Isolated nodes are removed from the graph. 
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C. Web_Object_Rank Algorithm 

This algorithm is basically a query independent algorithm 
that takes a web graph as an input and assigns a rank to every 
object which can specify the relative authorization of that web 
page. In the proposed algorithm, following is the list of 
variables 

moist_fact (m) is the moister factor: the probability of 

random surfer to restart search from another web object 

1-m is the probability of the random surfer to search web 

objects from randomly chosen outlinks 

outlinks is the number of web objects linked with a 

particular page 

N is the number of objects in the domain 

prob[i][j] is the Probability Transition Matrix for all i ,j € 

1 to N 

adj[i][j] is the Adjacency Matrix for all i ,j € 1 to N 

x is the Probability Vector 

itr is Iteration 



D. Web_Object_Rank Algorithm 



Step 1. Create a web graph of various objects in a 

domain. 
Step 2. Set prob[i][j]=adj[i][j] 

Step 3. Compute number of out links from a particular 

node say counter. 

IF outlinks of web objects = NULL 

THEN prob[i][j] is equally distributed for all i ,j 

ELSE prob values are distributed according to 
number of outlinks 
For all i,j IF (counter = 0) 
THEN 

prob[i][j]=l/N 
ELSE 

IF(prob[i][j]=l) 
THEN 

prob[i][j] =1.0/counter 
Step 4. Multiply the resulting matrix by 1 - m. 

Step 5. Add m/N to every entry of the resulting matrix, 

to obtain Probability Transition Matrix. 
For all i , j Do 

prob[i][j]=(prob[i][j]*(l-m))+((m/N); 
Step 6. Randomly select a node from to N-l to start a 

walk say s_int . 
Step 7. Initialize Random surfer and itr to keep account 

of number of iterations required to 0. 
Step 8. Try to reach at steady state with in 200 iterations 

otherwise toggling occur 
Step 9. Multiplying Probability Transition Matrixes 

with Probability Vector to get steady state 
Step 10. Check either system enters in steady state or not 
Step 11. Print the ranks stored in Probability Vector x 
and EXIT. 



V. IMPLEMENTATION 

This implementation is based upon random surfer 
model [7] and Markov chain [13, 14]. The random surfer 
visit the objects in the web graph according to distribution 
based on which random surfer can be in one of the 
following four possible states at any time. 

Initial state is state of the system from where it will 
start its walk. The system is set in the random state by 
randomly selecting an object using random function and 
value corresponding to that web object in the Probability 
Vector is set to unity. Rest of the values in the Probability 
Vector is zero. Steady state is that state of the system when 
the Probability Vector of random surfer fulfills the 
properties of irreducibility and aperiodicity's. To check 
either the system get the steady state or not, two successive 
values of the Probability Vector must be same. Ideal state 
is that state of the random surfer when the system achieves 
the steady state but at the same time web object ranks are 
distributed uniformly to all documents. Toggling state is 
achieved by the random surfer when the system is not able 
to reach at steady state and just toggle between two set of 
object ranks. 




Fig. 1. Web Graph 
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C. Results and Discussion 

The web graph shown in Fig 1 is used for analyzing various 
factors of the proposed algorithm. Variation in graph structures 
used for analysis change the performance of the algorithm. The 
graph shows 10 web objects in a domain that are interlinked as 
strongly connected graph. Every two nodes of the graph have a 
path with less number of links. Oi is the i th web object in the 
domain where i vary from 1 to 10. The adjacency matrix for 
web graph of Fig 1 is shown in Fig 2. 
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Fig. 2. Adjacency Matrix for all i ,j € 1 
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To analyze the convergence speed, number of iterations 
required by random surfer to reach at a steady state is recorded 
in Table 1 and the corresponding graph is shown in fig 3. In 
fig. 3 infinity value is shown by a large number of iterations 
(200 or more). It clearly shows that as the moister factor 
approaches 1 , the number of iterations is reduced. 

Table 1: Moister Factor Vs No. of Iterations 



Moister Factor 


No. of Iterations 





Infinity 


0.05 


Infinity 


0.1 


Infinity 


0.15 


Infinity 


0.2 


83 


0.25 


73 


0.3 


62 


0.35 


46 


0.4 


41 


0.45 


33 


0.5 


35 


0.55 


39 


0.6 


24 


0.65 


21 


0.7 


20 


0.75 


22 


0.8 


16 


0.85 


12 


0.9 


11 


0.95 


10 


1 


2 



Moister Factor vsNo. of Iterations 
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~T 
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Fig. 3 . Moister Factor vs Number of Iterations 

It is further analyzed that as the Moister Factor is equal 
to 1, random Surfer enters into the Ideal state and the 
corresponding rank values of the web objects is same as in 
table 2. The graph for the ideal state is shown in Fig 4. 



Table 2: Ranks of objects at moister factor 1 


Object 


Computed Rank 


o x 


0.1 


o 2 


0.1 


o 3 


0.1 


o 4 


0.1 


o 5 


0.1 


o 6 


0.1 


o 7 


0.1 


o 8 


0.1 


o 9 


0.1 


Oio 


0.1 



Computed Rank at Moister factor 1 



* 0.12 

« 0.1 

= 0.08 

S 0.06 

a 0.04 

E 0.02 

8 o 



-Computed Rank 



N & & O* & O* & & & n* 
Web Objects 



Fig.4. Random Surfer Ideal State 

Figure 5 shows that for the Moister Factor less than 
0.2, no rank is provided to any web object and system 
enters into the toggling state with large number of 
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iterations for the given domain. Also, the ranks computed by 
the proposed algorithm for moister factor values from 0.2 to 1 
are shown. 



Computed Object Ranks at various Moister Factor 
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Fig. 4. Moister factor (>.2) to different documents 

From the above graphs and analysis, we can say that the 
moister factor plays a main role in this algorithm and 
performance of algorithm can be improved if this factor is 
selected properly. The value of moister factor can vary from 
to 1 but in most of the cases system enter into the toggling state 
if value selected is less than 0.2 and at the value 1 system enter 
into ideal state giving insignificant results. Value must be 
closer to 1 but can not be 1. As shown in Fig. 2 systems 
achieve a steady state in less number of iterations if moister 
factor value is closer to 1 . 

CONCLUSION 

The current study was conducted to demonstrate how the 
link structure of the web can be used to provide the ranking to 
various documents. This ranking can be provided offline. With 
the help of this approach one can prioritize the various 
documents on the web independent of the query. However a 
complete score computation is based on various other factors. 
In the proposed algorithm a damping factor is used that play a 
very important role on the analysis of the algorithm. After the 
analysis it is concluded that damping factor must not be 
selected closer to zero. At the damping factor one, the system 
enters into the ideal state and the ranking provided is 
insignificant. As per evaluation the damping factor must be 
selected greater than or equals to 0.5. However, if we consider 
convergence speed as only factor to evaluate the performance 
than the best moister factor will be .95. The proposed algorithm 
is query independent algorithm and does not consider query 
during ranking. 
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ABSTRACT- -This paper presents artificial neural 
network method using functional back propagation 
algorithm (FUBPA) for implementing concurrency 
Control while developing dial of a fork using Autodesk 
inventor 2008. Initially, the various parts are decided 
and the sequence in which they have to be drawn. While 
implementing concurrency control, this work ensures 
that associated parts cannot be accessed by more than 
one person due to locking. The FUBPA learns the 
objects and the type of transactions to be done based on 
which node in the output layer of the network exceeds a 
threshold value. Learning stops once all the objects are 
exposed to FUBPA. During testing performance, 
metrices are analyzed. 

Keywords: Concurrency Control, Functional 
Back Propagation Network, Transaction Locks, Time 
Stamping. 



I. INTRODUCTION 
Maintaining consistency in transactions of 
objects during drawing huge computer aided object is 
the result of efficient concurrency Control. In 
computer aided design (CAD), many persons will be 
accessing different parts of same objects according to 
the type work allotted to engineers. As all the parts of 
the same objects are stored in a single file, at any 
point of time, there should not be corruption of data, 
inconsistency in storage and total loss of data. 

Locks are used for accessing objects. In a 
database operation lock manager plays an important 
role whether one or more transactions are reading or 
writing any part of T where T is an item. It is the 
part of that record, for each item I. Gaining access to 
I is controlled by manager and ensure that there is no 
, access (read or write) would cause a conflict. The 
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lock manager can store the current locks in a lock 
table which consists of records with fields (<object>, 
<lock type>, <transaction>) the meaning of record (I, 
L, T) is that transaction T has a lock of type L on 
object I [1-4]. 

The process of managing simultaneous 
operations on the database without having them 
interfere with one another is called concurrency. [5-8] 
When two or more users are accessing database 
simultaneously concurrency prevents interference.. 
Interleaving of operations may produce an incorrect 
result even though two transactions may be correct. 
Some of the problems that result in concurrency are 
lost update, inconsistent analysis and uncommitted 
dependency. 

II. PROBLEM DEFINITION 
There is inability to provide consistency in 
the database when long transactions are involved. It 
will not be able to identify if there is any violation of 
database consistency during the time of commitment. 
It is not possible to know, if the transaction is with 
undefined time limit. There is no serializability when 
many users work on shared objects. During long 
transactions, optimistic transactions and two phase 
locking will result in deadlock. Two phase locking 
forces to lock resources for long time even after they 
have finished using them. Other transactions that 
need to access the same resources are blocked. The 
problem in optimistic mechanism with Time 
Stamping is that it causes repeated rollback of 
transactions when the rate of conflicts increases 
significantly. Artificial neural network [9] with 
Functional Back Propagation Network (FUBPA) has 
been used to manage the locks allotted to objects and 
locks are claimed appropriately to be allotted for 
other objects during subsequent transactions. 



Inbuilt library drawing for the dial of fork (Figure 1) 
are available in AutoCAD. The fork is used in the 
two wheeler front structure. Due to customer 
requirements, the designer edits the dial of fork in the 
central database by modifying different features. 
Consistency of the data has to be maintained during 
the process of modifications of different features. 
Following sequences of locking objects have to be 
done whenever a particular user accesses a specific 
feature of the dial of fork. Each feature is treated as 
an object. The features are identified with numbers 
and corresponding feature names. In this explanation, 
Oi refers to an object / feature marked as 1. 
In general, the following sequences are formed when 
creating dial of fork. The major parameters involved 
in creating the dial of the fork are hollow cylinder, 
wedge and swiveling plate. The various constraints 
that have to be imposed during modifications of 
features by many users on this dial of fork are as 
follows: 

• During development of features, hollow 
cylinder details should not be changed 

• External rings are associated with hollow 
cylinder. 

• The circular wedge has specific slope and 
associated with hollow cylinder. 

This dial of fork has following entities. 

1) Features 1, 2, (set 1) 

2) Features 10, 11,12,13,14 (set 2) 

3) Features 5,6,7,8 (set 3) 

4) Features 3, 4 (set 4) 
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Figure 1. Dial of fork 

1 Lower end, 2 height of the end part, 3 external support, 4 Height of the external 
support, 5 support for the wedge, 6 height of the support for the wedge, 7 wedge, 8 
thickness of the wedge, 9 slope of the wedge, 10 Wedge lock, 11 Height of the wedge 
lock, 12 Concentric hole, 13 separator, 14 Guideway 



Set 1, set 2, set 3, set 4 can be made into 
individual drawing part files (part file 1, part file 2, 
part file 3 and part file 4) and combined into one 
assembly file (containing the part files 1,2 3 and 4 
which will be intact). When the users are accessing 
individual part files, then transactions in part file 1 
need not worry about the type of transactions in part 
files 2,3,4 and vice versa among them. When the part 
files 1 2, 3 and 4 are combined into a single assembly 
file, then inconsistency in the shape and dimension of 



the set 1, set 2, set 3 and set 4, during matching 
should not occur. Provisions can be made in 
controlling the dimensions and shapes with upper and 
lower limits confirming to standards. At any time 
when a subsequent user is trying to access locked 
features, he can modify the features on his system 
and store as an additional modified copy of the 
features with Time Stamping and version names 
(allotted by the user / allotted by the system). 
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III. FUNCTIONAL UPDATE METHOD 
When the network is trained with analog 

data, the number of iterations is large for the 

objective function (J) to reach the desired mean 

squared error (MSE). The objective function does not 

reach the desired MSE due to some local minima, 

whose domains of attractions are as large as that for 

the global minimum. The network converges to one 

of those local minima, or the network diverges. The 

updating of the weights will not stop, unless every 

input is outside the significant update region (0.1 to 

0.9), and the outputs of the network will be 

approaching either or 1. This requires much 

iteration for the network to converge. To overcome 

these difficulties, a functional criterion, which results 

in faster convergence of the network, is used. 

The main idea of this method is that the 
weights of this network are updated only when any 
one of the nodes in the output layer of the network is 
misclassified. Even if one of the nodes in the output 
layer is not misclassified, no updating of weights is 
done. A node in the output layer is misclassified, if 
the difference between the desired output and the 
network is greater than 0.5. 

The number of layers and the number of 
nodes in the hidden layers are decided. The weights 
among layers are initialized. A training pattern is 



presented to the input layer of the network, and the 
difference between the network's output and the 
target output is calculated for each node in the output 
layer. If the difference obtained for each node is 
greater than the value of a functional criterion, a 
counter is incremented and the weights are updated. 
If the difference of not even one node is greater than 
0.5, no updating for the weights is done. The MSE of 
the network for each pattern is calculated only when 
at least one node in the output of the network is 
misclassified. Remaining training patterns are 
presented to the network. Training of the network is 
stopped when a performance index of the network is 
reached. 

The algorithm for the functional update is as 
follows: 

Step 1: The weights and thresholds of the network are 
initialized. 

Step 2: The inputs and outputs of a pattern are 
presented to the network. 

Step 3: The output of each node in the successive 
layers is calculated by: 



O (output of anode) = 1/ (1+exp (-Xwy xj) 



(1) 



Step 4: The number of nodes in the output layer, 
which are misclassified, are denoted by 'nm\ A node 
is misclassified, if it does not satisfy the equation: 



1-6 >D > 0.5 



(2) 
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Where s is the value fixed by the programmer, and 
D = | Desired output-Network output | ... (3) 

If 'nm' is empty, i.e., not even one node satisfies 
equation 2, step 2 is adopted. 

Step 5: If 'nm' is not empty, the objective function 
l Y is computed by: 



'■iSI D ' 



(4) 



Step 6: The weights and thresholds are updated. 

Step 7: The steps (2 to 6) are adopted, until the total 
MSE of all the patterns is below a specified value. 

IV. RESULTS AND DISCUSSION 
Let us assume that there are two users editing 
the dial of the fork. Userl edits Oi and hence 2 will 
be locked sequentially (Table 1). Immediately user2 
wants to edit 2 , however he will not get transaction 
as already 2 is locked. However, user2 or any other 
user can try to access 3 to O i4 



Table 1: Shape and dimension consistency 




management 


Group 


First feature 


Remaining feature 
to be locked 


Gl 


1 


2 


G2 


10 


11,12,13,14 


G3 


5 


6,7,8 


G4 


3 


4 



The variables used for training the ANN about locks 
assigned to different objects are transaction id, object 
id, lock mode (Table 2). 

Transaction id represents the client or any other 
intermediate transactions 



Object id represents the entire feature or an entity in 
the file. Mode represents type of lock assigned to an 
object. 

In Table 2, column 1 represents the lock type, column 
2 represents the value to be used in the input layer of 
the FUBPA. Column 3 gives binary representation of 
Lock type to be used in the output layer of FUBPA. 
The values are used as target outputs in the module 
during lock release on a data item. 



Table 2: Binary representation of lock type 


Lock type 


(Input layer 

representation 

numerical value). 


Binary 

representation in 

target layer of 

the FUBPA 


Object Not 
locked 





000 


S 


1 


001 


X 


2 


010 


IS 


3 


011 


IX 


4 


100 



Initially, user 1 and user 2 have 
opened the same dial of fork file from the 
common database. The following steps 
shows sequence of execution and results 
Ti edits Oi with write mode. Table 4 shows 
pattern formed for the training. 



Table3: First time pattern used for training FUBPA 


Object number 


Input 
pattern 


Target output 
pattern 


O! 


[1 1] 


[0 10] 



Step 1: The transaction manager locks objects 
mentioned in the third column of Table 1. Repeat 
step 1 with the patterns given in Table 4. 
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Table 4: Additional patterns used for training OML 
FUBPA 


Object number 


Input 
pattern 


Target output 
pattern 


0! 


[1 1] 


[0 10] 


o 2 


[2 1] 


[0 10] 



Step 2: A new transaction T 2 access 2 . A pattern is 
formed to verify if lock has been assigned to 2 and 
its associated objects Oi. Only when the locks are not 
assigned to 2 and Oi then T 2 is allowed. 

The following input patterns are presented to the 
testing module to find if the output [0 0] is obtained 
in the output layer. During testing, the final weights 
obtained during training will be used. Otherwise it 
means that lock has been assigned to either 2 . In 
such case, transaction is denied for T 2 Else the 
following Table 5 is presented in step 1 



Table 5: Additional patterns used for training 
FUBPA 


Object number 


Input pattern 


Target output 
pattern 


O! 


[11] 


[0 10] 


o 2 


[2 1] 


[0 10] 


o 3 


[31] 


[0 10] 


o 4 


[41] 


[0 10] 


o 5 


[5 1] 


[0 10] 


o 6 


[6 1] 


[0 10] 


o 7 


[7 1] 


[0 10] 


o 8 


[8 1] 


[0 10] 



Step 3: To know the type of lock value assigned to 
an object and for a transaction, testing is used. 
Testing uses the final weights created by training. 
The proposed FUBPA for lock state learning and 
lock state finding have been implemented using 
Matlab 7 



V. CONCLUSION 

An artificial neural network with FUBPA 
has been implemented for providing concurrency 
control to maintain consistency in the CAD database. 
A dial of fork has been considered that contains 14 
objects. The 14 objects have categorized into 4 
groups. The transaction behavior and concurrency 
control by the two users on the 14 objects have been 
controlled using FUBPA network. The neural 
network method requires memory based on the 
topology used for storing objects and its transactions 
when compared with conventional method. 
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Abstract — The geological surveying presently uses methods and 
tools for the computer modeling of 3D-structures of the 
geographical subsurface and geotechnical characterization as 
well as the application of geoinformation systems for 
management and analysis of spatial data, and their cartographic 
presentation. The objectives of this paper are to present a 3D 
geological surface model of Latur district in Maharashtra state of 
India. This study is undertaken through the several processes 
which are discussed in this paper to generate and visualize the 
automated 3D geological surface model of a projected area. 

Keywords-component; 3D Visualization, Geographical 
Information System, Digital Terrain Data Processing, Cartography. 

I. Introduction 

Traditional geological maps which illustrate the distribution 
and orientation of geological structures and materials on a two- 
dimensional (2D) ground surface are no longer sufficient for 
the storing, displaying, and analysing of geological 
information. It is also difficult and expensive to update 
traditional maps that cover large areas. Many kinds of raster 
and vector based models for describing, modelling, and 
visualizing 3D spatial data have been developed. At the mean 
time, with the fast development of sensor techniques and 
computer methods, several types of airborne or close range 
laser scanners are available for acquisition of 3D surface data 
in real or very fast time. A few more type of digital 
photogrammetry workstations are also available for semi- 
automatic interpretation of the complicated man made 3D 
surfaces. However due to image noises and limited resolution 
of current laser range data, so many existing techniques still 
need to be extended to fit real application. 



This paper presents a fast and efficient method to 
automate the generation of 3D geological surfaces from 2D 
geological maps. The method was designed to meet the 
requirement in creating a three-dimensional (3D) geologic map 
model of Latur district in Maharashtra state of India. The 
LULC (Land Use and Land Cover) database [11] of National 
Remote Sensing Centre, ISRO, India, for Latur district has 
been used for visualization experiments. The elevation data 
pertaining to Latur district is obtained from USGS (United 
State Geological Survey) Seamless server database [10] of 
United States and is used for digital elevation modelling 
(DEM) experiments. 



II. Study area 

Latur District is in the south-eastern part of the Maharashtra 
state in India. It is well known for its Quality of Education, 
Administration, food grain trade and oil mills. Latur district has 
an ancient historical background. The King 'Amoghvarsha' of 
Rashtrakutas developed the Latur city, originally the native 
place of the Rashtrakutas. The Rashtrakutas who succeeded the 
Chalukyas of Badami in 753 A.D called themselves the 
residents of Lattalut. Latur is a major city and district in 
Maharashtra state of India. It is well known for its quality of 
education, administration, food grain trade and oil mills. The 
district is divided into three sub-divisions and 1 talukas (sub- 
districts) [1]. 




Figure 1 . A false color composite imagery of India acquired by SPOT & 
IKONOS, the location of Latur district (Courtesy NRS A Hyd.). 
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Latur is located at 18°24'N 76°35'E / 18.4°N 76.58°E / 
18.4; 76.58 as shown in Fig.l. It has an average elevation of 
631 meters (2070 feet). It is situated 636 meter above mean sea 
level. The district is situated on Maharashtra-Karnataka 
boundary. On the eastern side of the Latur is Bidar district of 
Karnataka, whereas Nanded is on the Northeast, Parbhani 
district on the northern side, Beed on the Northwest and 
Osmanabad on the western and southern side. The entire 
district of Latur is situated on the Balaghat plateau, 540 to 638 
meters from the mean sea level. 

III. Automated 3D Surface Model 

3D geological information systems provide a means to 
capture, model, manipulate, retrieve, analyse, and present 
geological situations. Traditional geological maps which 
illustrate the distribution and orientation of geological materials 
and structures on a 2D ground surfaces provide vast amounts of 
raw data. It is thus vital to develop a set of intelligent maps that 
shows features of geological formations and their 
relationships [2]. 

A. Digital Elevation Model of Latur district 

DEM is a representation of the terrain surface by coordinates 
and numerical descriptions of altitude. DEM is easy to store 
and manipulate, and it gives a smoother, more natural 
appearance of derived terrain features. Therefore, the created 
DEM is the foundation of 3D geological maps when the z- 
coordinates of the vertices of geological formations can be 
interpolated. The data consists of 4 topographical map sheets, 
with 3D coordinates of terrain, contour lines, and other 
information. The maps are in GEOTIFF format at a scale of 
1:150000 (Fig.2). These DEMs were then integrated into a 
whole DEM of Latur using a DEM Global Mapper. The final 
gridded DEM data with 5 -metre intervals for Latur district was 
obtained (Fig.2). The file size is about 4.83MB. 
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B. Cropping DEMs using Latur district base map shape file. 




After integrating DEMs tiles, the next process is to extract 
(crop) the required region of Latur district from integrated 
DEMs using the latur district base map shape file. For this 
process, we use the software GLOBAL MAPPER llv to 
crop the DEMs with only required region's terrain data. The 
remaining area is considered as null data as shown in Fig. 3. 




Figure 3 . Cropped DEM using Latur district base map. 

C. Accessing and concatenating DEMs in MATLAB 

After the successful cropping of all the DEM data sheets 
(tiles), we import them in MATLAB for further processes. The 
DEMs can be converted in to DTED (Digital Terrain 
Elevation Data) version 0,1,2.. any format, and import them in 
MATLAB. The DTED0 files have 120-by-120 points. DTED1 
files have 1201-by-1201. The edges of adjacent tiles have 
redundant records. 

Acquiring all the data sheets with their specified location 
(projection) and sequence of data sheets are very important 
here. 

Concatenation of the DEM tiles with respect to their 
locations needs horizontal and vertical concatenation. 

1) Horizontal Concatenation 

First, we concatenate the matrices of top-left and top-right 
tiles (Fig.2), i.e. Horizontal concatenation. 



Hl= TL (horzcat) TR . 



(1) 



Figure 2. Tiled DEM of Latur District (courtesy USGS). 



where HI is a concatenated matrix of top-left (TL) and top- 
right (TR) matrices. 



176 



http://sites.google.com/site/ijcsis/ 
ISSN 1947-5500 



Next, we concatenate the matrices of Bottom-Left and 
Bottom-Right tiles, i.e. again Horizontal concatenation. 



H2=BL (horzcat) BR 



(2) 



where H2 is a concatenated matrix of Bottom-left (BL) and 
Bottom-right (BR) matrices. 

2) Vertical Concatenation 

Next, we need to concatenate HI and H2 matrices vertically, 
i.e. 

H = H1 (vertcat)H2 (3) 

where H is a complete concatenated matrix of HI and H2. 
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The shading function sets the shading. If the shading is 
interpolates, C must be of the same size as X, Y, and Z; it 
specifies the colors at the vertices. The color within a surface 
patch is a bilinear function of the local coordinates. If the 
shading is faceted (the default) or flat, C(i,j) specifies the 
constant color in the surface patch: 



(ij) - (ij+l) 
I C(ij) | 
(i+lj) - (i+lj+1) 



(5) 



In this case, C can be the same size as X, Y, and Z and its 
last row and column are ignored. Alternatively, its row and 
column dimensions can be one less than those of X, Y, and Z. 



D. Visualizing 3D geographical surface model 

A workflow was chosen, on the one hand, by applying GIS 
methods using ESRI shape files and global mapper software 
for data acquisition, maintenance, and presentation and on the 
other hand, by applying three-dimensional spatial modelling 
with a interactive 3D modelling in MATLAB. Based on Non- 
Uniform Rational data, any geometric shape can be modelled. 
Besides surfaces of the different engineering geological units, 
solids using boundary representation techniques were 
modelled [3]. In MATLAB it is one of the easiest way to 
visualize the well defined projected data sets in 3D view using 
mathematical functions surf() and mesh(). To visualize the 
acquired projected data set over a rectangular region, we need 
to create colored parametric surfaces specified by X, Y, and Z, 
with color specified by Z. 

A parametric surface is parameterized by two independent 
variables, i and j, which vary continuously over a rectangle; 
for example, l<=i<=m and l<=j<=n. The three functions x(i,j), 
y(i,j), and z(i,j) specify the surface. When i and j are integer 
values, they define a rectangular grid with integer grid points. 
The functions x(i,j), y(i,j), and z(i,j) become three m-by-n 
matrices, X, Y, and Z. Surface color is a fourth function, c(i,j), 
denoted by matrix C. Each point in the rectangular grid can be 
thought of as connected to its four nearest neighbours [6]. 



i-lj 

I 

i,j-l - ij - ij+l 

I 
i+lj 



(4) 



Surface color can be specified in two different ways: at the 
vertices or at the centers of each patch. In this general setting, 
the surface need not be a single-valued function of x and y. 
Moreover, the four-sided surface patches need not be planar. 
For example, one can have surfaces defined in polar, 
cylindrical, and spherical coordinate systems [8]. 



E. Assigning axes to 3D model 

MATLAB automatically creates an axes, if one does not 
already exist, when you issue a command that creates a graph, 
but the default axes assigned by MATLAB doesn't match with 
real coordinate systems of this projected area. 

This existing model is built with 3 axes data x, y and z 
respectively. The X and Y axis represents the latitude and 
longitude values for this model i.e. 

UPPER LEFT X=76.2076079218 
UPPER LEFT Y= 18. 83 85493 143 
LOWER RIGHT X=77.2934412815 
LOWER RIGHT Y=17.8677159574 



WEST LONGITUDE=76 c 
NORTH LATITUDES 8° 
EAST LONGITUDE=77° 
SOUTH LATITUDES 7° 



12' 27.3885" E 
50' 18.7775" N 
17' 36.3886" E 

52' 3.7774" N 



The above shown values are associated with all four tiles of 
DTED files. The Z axis itself represents the terrain (height) 
values of ground surface objects. Here in this model the 
elevation data is assigned in feet scale format i.e. to 3000 
feets. 



IV. Results 

With reference to the processes discussed above, the 3D 
visualization experimental results are shown in the Figs. 4, 5 
and 6 for 3D model of Latur district geological surface. 
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Figure 4. A 70o camera view point of surface model with gray color scheme. 
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g 2000 

^ 1500, 
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Figure 5. A 45o camera view point of surface model with HSV color 
scheme. 




Figure 6. A true color composite scheme (Atlas shader) 3D model. 



V. Conclusions and future work 

Some key processes for automated 3D geological surface 
modeling such as data acquisition, concatenation, 3D surface 
modeling and axes data managing have been presented. 
The visualization experiments are done using data for Latur 
district. In the future work, we attempt to overlay real time 
map layers on this 3D surface model. 
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Abstract- This paper presents the Innovative idea of 
Sectorization of Haar Wavelet transformed images and 
Kekre's Wavelet Transformed images to extract features 
for image retrieval. Transformed images have been 
sectored into 4,8,12 and 16 sectors. Each sector produces 
the feature vector component in particular sector size. 
Thus the feature vector size increases with the increase 
in the sector size. The experiment of augmenting the 
feature vectors with extra components performed .The 
performance of proposed method of sectorization 
checked with respect to increase in sector sizes, effect of 
augmentation of extra components in both Haar and 
Kekre's Wavelet sectorization .The retrieval rate 
checked with crossover of average precision and recall. 
LIRS and LSRR are calculated for average of randomly 
selected 5 images of all 12 classes and compared with the 
overall average of LIRS/LSRR. The work experimented 
over the image database of 1055 images and the 
performance of image retrieval with respect to two 
similarity measures namely Euclidian distance (ED) and 
sum of absolute difference (AD) are measured. 

Keywords- CBIR, Haar Wavelet, Kekre's Wavelet 
Euclidian Distance, Sum of Absolute Difference, LIRS, 
LSRR, Precision and Recall 



I. 



INTRODUCTION 



Digital world of the current era needs storage and 
management of bulky digital images. It is the need of the 
century to have better mechanism to store, manage and 
retrieve whenever needed digital images from the large 
database. Content-based image retrieval (CBIR), [1-4] is any 
technology that in principle helps to achieve this motive by 
their visual content. By this definition, anything ranging 
from an image similarity function to a robust image 
annotation engine falls under CBIR. This characterization of 
CBIR as a field of study places it at a unique juncture within 
the scientific community. It is believed that the current state- 
of-the-art in CBIR holds enough promise and maturity to be 



useful for real-world applications if aggressive attempts are 
made. For example, many commercial organizations are 
working on image retrieval despite the fact that robust text 
understanding is still an open problem. Of late, there is 
renewed interest in the media about potential real-world 
applications of CBIR and image analysis technologies, 
There are various approaches which have been 
experimented to generate the efficient algorithm for CBIR 
like FFT, DCT, DST, WALSH sectors [8-14][21][22], 
Transforms [16] [17], Vector quantization[17], bit truncation 
coding [18][19]. 

The problem of CBIR still needs lots of research to achieve 
the better retrieval performance. It needs extensive 
experiments on all of its parameters i.e. Feature extraction, 
similarity measures, retrieval performance measuring 
parameters. 

In this paper we have introduced a novel concept of 
Sectorization of Haar Wavelet and Kekre's Wavelet in both 
column wise and row wise transformed color images for 
feature extraction (FE).Two different similarity measures 
namely sum of absolute difference and Euclidean distance 
are considered. Average precision, Recall, LIRS and LSRR 
are used for performances study of these approaches. 

II. HAAR WAVELET [5] 

The Haar transform is derived from the Haar matrix. The 
Haar transform is separable and can be expressed in matrix 
form 

[F] = [H] [f] [H] T 
Where f is an NxN image, H is an NxN Haar transform 
matrix and F is the resulting NxN transformed image. The 
transformation H contains the Haar basis function hk(t) 
which are defined over the continuous closed interval 
tC[0,l]. 
The Haar basis functions are 
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• When k=0, the Haar function is defined as a 
constant 

h k (t) = 1/VN 

• When k>0, the Haar Function is defined as 

{2 Vl2 (q-l)/2 P <=t<(q-0.5)/2 P 
-2 Vf2 (q-0.5)/ 2 p <=t<q/2 P 
otherwise 



(1) 



Where < p < log : N and 1 < q < 2 P 

III. KEKRE'S WAVELET [5] 

Kekre's Wavelet transform is derived from Kekre's 
transform. From NxN Kekre's transform matrix, we can 
generate Kekre's Wavelet transform matrices of size 

(2N)x(2N), (3N)x(3N), , (N2)x(N2). For example, from 

5x5. Kekre's transform matrix, we can generate Kekre's 
Wavelet transform matrices of size 10x10, 15x15, 20x20 
and 25x25. In general MxM Kekre's Wavelet transform 
matrix can be generated from NxN Kekre's transform 
matrix, such that M = N * P where P is any integer between 
2 and N that is, 2 < P < N. Kekre's Wavelet Transform 
matrix satisfies [K][K]t = [D] Where D is the diagonal 
matrix this property and hence it is orthogonal. The diagonal 
matrix value of Kekre's transform matrix of size NxN can be 
computed as 

D(x,y)= I 2 ,ifx = y=N 

J N ,ifx = y=l 

] ,ifx^y 

[p(x+l,yH) + 2CN-x+l) , ifx=y=p andp^l orN 

(2) 



IV. SECTORIZATION OF TRANSFORMED 
IMAGES [8-14] 



A. 4 Sector Formation 

Even and odd rows/columns of the transformed images 

are checked for sign changes and the based on which 

four sectors are formed as shown in the Figure 1 

below: 



Computation of 4 Sectors 



Signcf Evai 


figncf Odd 

IVW/CftllMlUl 


Quidr&ri: As signed 


+ 


+ 


1(0-90°) 


+ 




11(90-130) 






IIJC ISO- 270 ) 




+ 


IV(270-360) 



Figure 1 : Computation of 4 Sectors 

B. 8 Sectors Formation 

The transformed image sectored in 4 sectors is taken 
into consideration for dividing it into 8 sectors. Each 
sector is of angle 45°. Coefficients of the transformed 
image lying in the particular sector checked for the 
sectorization conditions as shown in the Figur2. 

Computation of 8 Sectors 



Sectors 


Conditions 


I.IV.V.VIII 


|A|>=|B| 


ll.lll.vl.vll 


|B| :-=|A| 


Where 

ft= Even Row / Column of Transformed Image 

b = Odd Row / Column of Transformed Image 



Figure 2: Computation of 8 Sectors 

C. 12 Sector Formation. 

Division each sector of 4 sectors into angle of 30° 
forms 12 sectors of the transformed image. 
Coefficients of the transformed image are divided into 
various sectors based on the inequalities shown in the 
Figure 3. 

Computation of 12 Sectors 



Sector* 


Condition* 


1, IV, VII, X 


a =-rr b 


II, V, VIII, XI 


1/J3-|A|<=|B| <=^/3-|A| 


III.VI, K, XII 


Otherwise 


Where 

A = Even Row / Column of Transformed Image 

B = Odd Row / Column of Transformed Image 



Figure 3:Computation of 12 Sectors 
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D. 16 Sector Formation: 



Similarly we have done the calculation of inequalities 
to form the 16 sectors of the transformed image. The 
even/odd rows/ columns are assigned to particular 
sectors for feature vector generation 



V. EXPERIMENTAL RESULTS 

We have used the augmented Wang image database [2] The 
Image database consists of 1055 images of 12 different 
classes such as Flower, Sunset, Barbie, Tribal, Cartoon, 
Elephant, Dinosaur, Bus, Scenery, Monuments, Horses, 
Beach. Class wise distribution of all images in the database 
has been shown in the Figure 4. 
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Figure 4: Class wise distribution of images in the Image 
database 




Figure5. Query Image 

The query image of the class Horse has been shown in 
Figure 5. For this query image the result of retrieval of both 
Column wise and Row wise Haar and Kekre's wevlet 
transformed images for all sectors are checked. The Figure 6 
shows the first 20 retrieval for the query image with respect 
to of Row wise Haar Wavelet Sectorization for its 16 
Sectors with sum of absolute difference as similarity 
measure. It can be observed that the retrieval of first 20 
images are of relevant class i.e. Horse; there are no 
irrelevant images till first 45 retrievals in both cases. The 
result of row wise Kekre's Wavelet shown in Figure 7; the 
retrieval of first 20 images is same as Kekre's Wavelet 
except the order of retrieval of images changes. 







m n 







Figure 6: First 20 Retrieved Images of Row wise Haar 
wavelet (16 Sectors) 







i ??£■■■■■ -ir^N II i f?*w 

















Figure 7: First 20 Retrieved Images of Row wise Kekre's 
Wavelet Sectorization (16 Sectors). 

Once the feature vector is generated for all images in the 
database a feature database is created. 5 randomly chosen 
query images of each class is produced to search the 
database. The image with exact match gives minimum 
absolute difference and Euclidian distance. To check the 
effectiveness of the work and its performance with respect to 
retrieval of the images we have calculated the overall 
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average precision and recall as given in Equations (3) and 
(4) below. Two new parameters i.e. LIRS and LSRR are 
introduced as shown in Equations (5) and (6). 
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All these parameters lie between 0-1 hence they can be 
expressed in terms of percentages. The newly introduced 
parameters give the better performance for higher value of 
LIRS and Lower value of LSRR [8- 13]. The class wise 
performance of the proposed algorithm with respect to 
average precision and recall cross over points in all sectors 
for both Haar Wavelet (Row wise and column wise) and 
Kekre's Wavelet (Row wise and column wise) with the 
consideration of two similarity measures namely Euclidean 
distance(ED) and sum of absolute difference (AD) has been 
shown in Figure 8- Figure 11. The average value of each 
method has been plotted as horizontal lines to compare the 
individual class performances .It is seen that sectorization of 
column wise performs better than row wise transformed 
images in both HAAR and Kekre's wavelet. The use sum of 
absolute difference gives better retrieval for in all sectors 
except 16 sectors compared to Euclidian distance for all 
classes of images. The retrieval performance for each 
classes vary as it is observed that Diana sour, flowers, sunset 
and horses have maximum of retrieval i.e. 80%, 70%, 50% 
and 50% respectively. 

The Figure 12 depicts the overall average performances of 
Haar and Kekre's wavelet. It shows that for sector sizes 
4,8,12 Haar wavelet has retrieval performance than Kekre's 
wavelet. The sectorization of column wise transformed 
images is far better i.e. on average 45% than row wise i.e. on 
average 30%. The performance of the proposed algorithm is 
checked with respect to two new parameters i.e. LIRS and 
LSRR .The class wise performance of LIRS and LSRR 
shown in Figure 13 -Figure 20.The class having maximum 
value of average precision and recall cross over point must 
have maximum LIRS and Minimum LSRR. Taking the 
example of Diana sour class which has cross over points as: 
80% (Row wise and column wise Haar and Kekre's 
Wavelet), has maximum LIRS (see Figures 13,14,17 and 
Figure 18) and Minimum LSRR (see Figures 15,16,19 and 
20). Similarly these parameters can be easily checked for 
other classes as well. Thus these parameters are very useful 
to check the performances of the retrieval in CBIR. 



Average Precision and Recall Cross ov 
(Row wise Haar Wavelet) 
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Figure 8: Overall Average Precision and Recall performance 

of Sectorization of Row wise Haar Wavelet. Absolute 

Difference(AD) and Euclidian Distance (ED) as similarity 

measures. 
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Figure 9: Overall Average Precision and Recall performance 

of Sectorization of Column wise Haar Wavelet. Absolute 

Difference (AD) and Euclidian Distance (ED) as similarity 

measures 
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Figure 10: Overall Average Precision and Recall 
performance of Row wise Kekre's Wavelet Sectorization 
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Figure 1 1 Overall Average Precision and Recall 

performance of Column wise Kekre's Wavelet Sectorization 

with Absolute Difference (AD) and Euclidian Distance (ED) 

as similarity measures 
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Figure 12: Comparison of Overall Precision and Recall 

cross over points of Kekre's Wavelet and Haar Wavelet with 

Absolute Difference (AD) and Euclidean Distance (ED) as 

similarity measure. 
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Figure 13: The LIRS Plot of Row wise Haar transformed 
images . Overall Average LIRS performances (Shown with 
Horizontal lines :0.068 (4 Sectors ED), 0.086 (4 Sectors 
AD), 0.036(8 Sectors ED), 0.038(8 Sectors AD), 0.040(12 
Sectors ED), 0.066(12 Sectors AD), 0.068(16 Sectors ED), 
0.088(16 Sectors AD) ). 
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Figure 14: The LIRS Plot of Column wise Haar transformed 
images . Overal Average LIRS performances (Shown with 
Horizontal lines :0.060 (4 Sectors ED), 0.074 (4 Sectors 
AD), 0.063(8 Sectors ED), 0.089(8 Sectors AD), 0.061(12 
Sectors ED), 0.078(12 Sectors AD), 0.029(16 Sectors ED), 
0.030(16 Sectors AD) ). 
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Figure 15: The LSRR Plot of Row wise Haar transformed 
images . Overall Average LSRR performances (Shown with 
Horizontal lines :0.83 (4 Sectors ED), 0.84 (4 Sectors AD), 
0.87(8 Sectors ED), 0.86(8 Sectors AD), 0.88(12 Sectors 
ED), 0.86(12 Sectors AD), 0.64(16 Sectors ED), 0.67(16 
Sectors AD) ). 
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Figure 16: The LSRR Plot of Column wise Haar 
transformed images . Overall Average LSRR performances 
(Shown with Horizontal lines :0.63(4 Sectors ED), 0.65 (4 
Sectors AD), 0.639(8 Sectors ED), 0.638(8 Sectors AD), 
0.639(12 Sectors ED), 0.633(12 Sectors AD), 0.94(16 
Sectors ED), 0.84(16 Sectors AD) ). 
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Figure 17: The LIRS Plot of Row wise KWT transformed 
images . Overall Average LIRS performances (Shown with 
Horizontal lines :0.048 (4 Sectors ED), 0.059 (4 Sectors 
AD), 0.044(8 Sectors ED), 0.053(8 Sectors AD), 0.067(12 
Sectors ED), 0.076(12 Sectors AD), 0.070(16 Sectors ED), 
0.10(16 Sectors AD)). 
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Figure 18: The LIRS Plot of Column wise KWT 
transformed images . Overall Average LIRS performances 
(Shown with Horizontal lines :0.061 (4 Sectors ED), 0.078 
(4 Sectors AD), 0.064(8 Sectors ED), 0.091(8 Sectors AD), 
0.066(12 Sectors ED), 0.090(12 Sectors AD), 0.030(16 
Sectors ED), 0.048(16 Sectors AD) ). 
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Figure 19: The LSRR Plot of Row wise KWT transformed 
images . Overall Average LSRR performances (Shown with 
Horizontal lines :0.813 (4 Sectors ED), 0.807 (4 Sectors 
AD), 0.085(8 Sectors ED), 0.80(8 Sectors AD), 0.85(12 
Sectors ED), 0.80(12 Sectors AD), 0.63(16 Sectors ED), 
0.67(16 Sectors AD) ). 



LSRRPlot 
(Column wise KWT) 



& 

S o.s 

2 

£ 3.6 
a 0.4 



r nirir" 

■ 

r 



J" & /" ^ f >" *f J =/ • ^ j* 

Image classes in the Database 



■ 4 Sectors ED 

■ 4 Sec tors AD 

■ S Sectors ED 

■ B Sectors AD 

■ 12 Sectors ED 

■ 12 Sectors AD 

- 16 Sectors ED 

■ 16SectorsAD 



Figure 20: The LSRR Plot of Column wise KWT 
transformed images . Overall Average LSRR performances 
(Shown with Horizontal lines :0.639 (4 Sectors ED), 0.648 
(4 Sectors AD), 0.639(8 Sectors ED), 0.637(8 Sectors AD), 
0.639(12 Sectors ED), 0.637(12 Sectors AD), 0.950(16 
Sectors ED), 0.873(16 Sectors AD) ). 

VI. CONCLUSION 

The work experimented on 1055 image database of 12 
different classes discusses the performance of sectorization 
of Haar wavelet and Kekre's wavelet transformed color 
images for image retrieval. The work has been performed 
with both approaches of column wise and row wise 
transformation. The performance of the proposed method is 
checked with respect to various sector sizes and similarity 
measuring approaches namely Euclidian distance and sum of 
absolute difference. It has been observed that the 
combination of column wise Haar wavelet sectorization with 
sum of absolute difference and augmented feature vector 



gives better performance as far as overall average precision 
and recall cross over point s concerned as compared to 
Kekre's Wavelet transform as shown in the Figure 12. The 
newly introduced parameter LIRS and LSRR gives good 
platform for performance evaluation to judge how early all 
relevant images is being retrieved (LSRR) and it also 
provides judgement of how many relevant images are being 
retrieved as part of first set of relevant retrieval (LIRS). The 
sum of absolute difference as similarity measure is 
recommended due to its lesser complexity and better 
retrieval rate performance compared to Euclidian distance. 
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Abstract — The growing demand for last mile broadband access is resulted 
from the increased growth of speedy multimedia services for mobile, 
residential and little business customers. Technologies based on 802.16 
WiMAX (Worldwide Interoperability Microwave Access) ensures to offer 
high data rates in long distance and afford multimedia services and are 
expected to act as key issue for high speed broadband services. The technique 
for building multi-hop mesh is provided by IEEE 802.16 WiMAX standard. 
This can be act as high speed wide-area wireless network and can afford better 
wireless coverage up to 5 miles with Line of Sight (LOS) transmission inside 
the bandwidth of around 70 Mbps. As the wireless environment varies 
unexpectedly, routing in wireless network is challenging work. There are 
several demands for IEEE 802.16 WiMAX routing like delay, long 
transmission scheduling, increasingly stringent Quality of Service (QoS) 
support, load balance and fairness restrictions. The aim of this survey is to 
analyze some of the routing protocols proposed by various authors for IEEE 
802.16 WiMAX networks. 

Keywords — IEEE 802.16, Routing Algorithm, Wireless mesh networks, 
Scheduling. 

I. Introduction 

IN present telecommunications, networking and services are 
varying in a rapid way to support next generation Internet 
user environment. Wireless networks will play a significant 
role in supporting next generation Internet. Wireless 
broadband networks are being increasingly deployed and used 
in the last mile for extending or enhancing Internet 
connectivity for fixed and/or mobile clients situated on the 
edge of the wired network. 

WiMAX is considered as an important wireless technology 
and involved in several probable applications in case of high 
data rate, greater network coverage, strong QoS capabilities 
and cheap network deployment and maintenance costs. This is 
estimated to support many business applications which require 
the support of quality of service. WiMAX can be modified to 
apply in different modes such as point-to-multipoint (PMP) or 
Mesh mode based on applications and network investment. 

The vast increasing user demand for faster connection in Web 
and VoIP services has lead to the progress of new broadband 
access technologies in the current days. In the year 2004, a 
IEEE 802.16 standard which is generally called as WiMAX is 
finalized in order to provide last-mile fixed wireless 
broadband access in the Metropolitan Area Network (MAN) 



with performance as good as to conventional cable, DSL or Tl 
networks. 

The frequency required for the operation of IEEE 802.16 in 
case of Line-of-Sight (LOS) is 10 to 66 GHz, whereas, for non 
Line-of-Sight, operating frequency is 2 to 11 GHz. Orthogonal 
frequency division multiplexing (OFDM) is utilized in the 
physical layer in order to support adaptive modulation and 
coding. Based on the condition of channel, this can afford a 
data rate up to 134 Mbps per base station for each channel of 
28 MHz. An IEEE 802.16 network contains Base Station (BS) 
and multiple Subscriber Stations (SSs). The Base Station acts 
as a gateway for the Subscriber Stations to the external 
network, and each SS acts as an access point that aggregates 
traffic from end users in a several geographical area. 

Most of the nodes are either stable or minimally movable in 
case of community wireless networks. This lead to the focus 
of routing protocol in improving the capacity of network or 
the performance of individual transfer, rather than focusing on 
movement of nodes of decreasing the power consumption. 
The major problem faced by such network lies in the loss in 
the full capacity because of interference among multiple 
concurrent transmissions. There are also certain basic 
difficulties in routing in wireless networks. Routing model has 
to support in both short time scales and long time scales. A 
better wireless routing protocol has to support equally for 
stability in long term route and accomplish opportunistic 
performance for shore term route. The robustness against a 
wide spectrum of soft and hard failures should be attained by 
the Wireless routing which ranging from transient channel 
outages, links with intermediate loss rates, from several 
channel disconnections, nodes under denial-of-service (DOS) 
attacks, and failing nodes. So challenges in routing protocol 
are to deal with both these problems. At the same time, it 
should support large node population by modifying itself to 
necessary extent. The random routing is provided in IEEE 
802.16 protocol in which parents are selected in random with 
the help of SSs while building the tree. This paper presents 
some of the routing techniques proposed by different author 
for 802.16 WiMAX networks. 
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II. Literature Survey 

Konark [1] proposed a routing and scheduling algorithm of 
IEEE 802.16 mesh backhaul network for radio resource 
management (RRM). The resource allocation concerned in 
IEEE 802.16 mesh backhaul network is investigated by the 
author. The multipath routing is the major issue considered 
here in order to utilize the resources of wireless radio 
efficiently and hence providing spectral efficiency. The main 
characteristic of scheduling technique is to permit the dynamic 
dispatching of data blocks. This is based on the condition of 
present buffer and the condition of route without knowing the 
demand in traffic. Hence it is helpful for heterogeneous traffic 
load which is supported by IEEE 802.16 Network which is a 
strong candidate in Wireless Networking characteristics. The 
load demand information of application layer and the 
interference information of PHY layer are utilized by routing 
protocol. The routing protocol is designed in order that the 
least mean path interference should be provided from the 
multiple hops. The scheduling technique is designed in order 
that it should find the maximum number of concurrent 
transmission which satisfies the Signal- to- interference plus 
noise ratio (SINR) limitations. In both the techniques, the 
iterative allocation continues until there is no unallocated 
capacity request. 

Kaarthick et al, [2] presented an adaptive routing algorithm to 
support distributed services in WiMAX. For stationary and 
mobile hosts, IEEE 802.16 is considered as a cost effective 
solution to Internet broadband access in the recent years. The 
WiMAX network can be enabled with distributed services in 
order to support several customers in the WiMAX network 
efficiently. An adaptive routing technique is proposed by the 
author for calculating the bandwidth guaranteed paths with the 
help of disciplined flooding and proxy setup to provide 
distributed services in IEEE 802. 16e. The performance of the 
algorithm can be computed with the help of AODV technique 
which act as a benchmark algorithm. The evaluation of this 
technique is based on the following four metrics: 

• Route discovery time 

• Delay 

• Total errors sent 

• Total packets dropped 

Susana et al, [3] put forth hybrid WiFi-WiMAX network 
routing protocol. The growth of multihop routing protocols is 
supported by the proliferation of Wireless Local Area 
Networks (WLANs). In addition, the requirement to cover 
larger areas has led to the development of fresh standards for 
Wireless Metropolitan Area Networks (WMANs). A new 
routing technique is developed by the author in order to 
combine WLANs and WMANs which will results in better 
interconnectivity. 



Kuran et al, [4] given a Cross-Layer Routing-Scheduling in 
IEEE 802.16 mesh networks. For the Internet Protocol-based 
fourth-generation (4G) wireless communication systems, 
broadband wireless access networks will be a fundamental 
component that is a part convergent and pervasive networking 
architecture. One of the major active techniques for broadband 
wireless access is IEEE 802.16 Mobile WiMAX. There are 
several challenges for the mixing of WiMAX and next- 
generation broadband networks such as diverse operational 
environment, increasingly stringent QoS support, 
power/coverage limitations and capacity boundaries. The 
possible solution to this problem is the mesh operation mode 
of IEEE 802.16. A cross-layer routing-scheduling scheme in 
IEEE 802.16 mesh networks is proposed by the author in this 
work. This technique uses the distributed and centralized 
scheduling capabilities of IEEE 802.16 link layer in mesh 
mode and routing in network layer together in order to 
perform the operation optimally. This technique is based on 
the techniques of IEEE 802.16 protocol. The experimental 
results pointed out that this method can considerably progress 
the improvement in the network performance particularly in 
case of a congestion in the Internet part of the traffic at the 
cost of a minor burden on the intranet traffic in the form of a 
slight increase in the end-to-end delay. 

Shiying [5] proposed a joint admission control and routing in 
IEEE 802.16-based mesh networks. In WiMAX-based 
metropolitan area mesh networks, the quality of service (QoS) 
provisioning techniques are considered in this paper. The 
connection admission control (CAC) and routing concern in 
the design and operation of wireless multihop mesh networks 
is studied by the author and proposes a joint connection 
admission control and the routing technique for various 
service classes with the intention to maximize the overall 
revenue from all agreed connections. Connection-level QoS 
limitations such as handoff connection dropping probability 
can be fixed within a threshold. By providing different reward 
rates, multiple service classes can be arranged according to 
their importance. Then the optimal CAC policies can be 
obtained by applying the optimization techniques. The 
optimality criterion is considered as the long-run average 
reward. The proposed technique can maximum revenue 
obtainable by the system under QoS constraints and the author 
shows that the optimal joint policy is a randomized policy. 
This indicates that the connections are admitted to the system 
with some prospect when the system is in definite states. 

Wan [6] given an interference aware routing and scheduling in 
WiMAX Backhaul Networks with Smart Antennas. A smart 
adaptive antenna can be used for intended communications 
and interference suppression as it can offer multiple Degrees 
of Freedom (DOFs). Network throughput can be appreciably 
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enhanced by more efficient spatial reuse by combining smart 
antennas in a WiMAX system. Routing and scheduling in 
WiMAX backhaul networks along with smart antennas are 
considered by the author. Full concern for interference impact 
and DOF availability is offered by the proposed method which 
properly defines the Interference-aware Tree Construction 
Problem (ITCP) for routing. Next, for resolving the problems 
in polynomial time the technique is proposed. In case of 
scheduling^, the proposed technique initially provides a 
polynomial-time, optimal technique for a particular case in 
which the number of DOFs in every node is sufficient to 
neglect all potential secondary interference. Finally, for 
scheduling problem effective heuristic algorithm is proposed 
by the author. 

Jin et aL, [7] put forth a routing and packet scheduling for 
throughput maximization in IEEE 802.16 Mesh Networks. 
The difficulty of maximizing the system throughput in IEEE 
802.16 broadband access networks with mesh topology is 
considered in this paper and the results are provided. At first, 
the simplified linear network is taken in account with only 
uplink traffic and presents a optimal scheduling technique. 
The author initiates an analytical result on the length of the 
schedule. The difficulty in routing and packet scheduling in 
general topology is then taken into account by the author and 
provides its NP completeness. The proposed method also 
offers an ILP formation for this difficulty. The author 
proposes techniques that find routes and schedules of packet 
transmissions in general mesh topologies depend on the 
optimal algorithm for linear networks. 

A Routing Metric and Algorithm for IEEE 802.16 Mesh 
Networks is provided by Ntsibane [8]. The high speed data 
rates over large distances and multimedia services are 
facilitated by the technologies that are based on 802.16. Also, 
this technology is likely to provide the high speed broadband 
delivery even beyond the current 3rd Generation wireless 
technologies. The mesh mode which utilizes this technique has 
the capability of escalating the coverage well beyond the cities 
and into the rural areas that are presently not served by 
conventional techniques. The author considers the potential of 
the mesh mode and provided a routing technique which is 
appropriate for coordinated distributed scheduling. 

Nazari et aL, [9] proposed case for mobility- and traffic-driven 
routing algorithms for wireless access mesh networks. Here 
the author has presented a new technique in order to develop 
routing algorithms in which the author idea attempt to 
understand the characteristics of network (i.e. network 
connectivity, mobility and rate of modification of the 
topology) and the patterns of expected traffic for a specific 
mobile scenario before the start of the design algorithm, which 
deals with the optimization of routing performance. This paper 



mainly applied to approach Triton, a proposed 802.16-based 
(WiMAX) maritime wireless access mesh network. Most 
probably the trace-based analysis out shows that, while the 
stationary nodes are most commonly selected in the route 
selection, then the rate of change between the routes to the 
gateway nodes is seems to get reduced by 23.3% and the 
average time taken for which routes between a node and a 
gateway remains valid is gradually spikes to 31%. The author 
described that it is quite important to take the expected traffic 
patterns for designing the routing pattern for a specific system. 
The network topology will not affect the expected traffic, so 
that reduction of overheads is done. 

Ben-Jye Chang et aL, [10] discussed about Adaptive 
competitive on-line routing algorithm for IEEE 802. 16j 
WiMAX multi-hop relay networks. IEEE 802. 16j is a relay 
based approach which is based on the IEEE 802. 16e standard, 
and WiMAX has proposed this standard. This is mainly for 
widening the service area of Base Stations (BSs) and to 
improve the signal strength quality i.e., received signal 
strength (RSS) quality. The main advantages of IEEE 802. 16j 
are the expense for building IEEE 802.16 WiMAX networks 
is comparably low and much compatible with existing 
WiMAX standards. Diverse features on mobility and relay 
range deliberately reveals that the, Relay Station (RS) can be 
grouped up into three types: Fixed RS (FRS), Nomadic RS 
(NRS) and Mobile RS (MRS). There are different types of 
RSs in relay-based WiMAX network. The routing path among 
a Mobile Station (MS) and the MR-BS are the two important 
factors to construct efficient relay-based WiMAX and find out 
an optimal solution. The author thus propose an IEEE 802. 16j- 
conformed relay-based adaptive competitive on-line routing 
approach, in which the selection of a multihop optimal path is 
done in terms of link bandwidth, path length and channel 
condition. This proposed paper significantly outperforms other 
approaches in Fractional Reward Loss (FRL), which is 
deliberately shown in numerical results. 

Al-Hemyari et aL, [11] stated a Centralized scheduling, 
routing tree in WiMAX mesh networks. IEEE 802.16 came 
into picture since there is a lot of demand for high speed 
internet access service in last few years. So IEEE 802.16 
working group have provided a broadband wireless access 
(BWA) for developing the worldwide interoperability for 
microwave access. (WiMAX) standard is used for wireless 
metropolitan area networks (MANs) in order to provide a 
broadband wireless over a miles, easy deployment, and high 
speed data rate for large spanning area. Single channel single 
transceiver scheme in WiMAX mesh network is implemented 
here for obtaining an efficient routing and collision free 
centralized scheduling (CS) algorithms, which is used to 
introduce the cross layer concept between the network layer 
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and media access controller (MAC) layer. The authors 
proposed method has improved the system performance with 
respect to scheduling length, channel utilization ratio (CUR), 
and the throughput of the system while compared to other 
system. 

Qassem et a/., [12] had a look on Cross-layer routing and 
scheduling for IEEE 802.16 mesh network. In recent years, 
requiremant for high-speed internet access and multimedia 
service has increased greatly. The IEEE 802.16 defines the 
wireless broadband access technology called WiMAX 
(Worldwide Interoperability Microwave Access) which 
aspires to facilitate the broadband wireless network for wide 
range of distance, easy deployment, and high speed data rate 
for large spanning area. In this paper, the author propose an 
Energy/bit Minimization routing and centralized scheduling 
(EbM-CS) based algorithm to multi-transceiver in WiMAX 
mesh network (WMN), which introduces the cross-layer 
concept between the layers of MAC and network. The results 
show that the proposed algorithm has improved in terms of 
performance with aspect of system throughput. 

Al-Hemyari et a/., [13] described the Constructing Routing 
Tree Centralized Scheduling using Multi-Channel System in 
802.16. The IEEE 802.16 standard describes WiMAX 
(worldwide interoperability for microwave access) mesh 
network, using the base station (BS) as a coordinator for 
centralized scheduling. This paper mainly comprises of a 
centralized scheduling algorithm by building up a routing tree 
in WiMAX mesh network, this will introduces the cross-layer 
concept between the media access controller (MAC) and the 
network layers. Here consideration is done for interference, 
hop-count, spatial reuse and quality of services (QoS) 
guarantee. The author states that, each node has one 
transceiver and can be tuned between multiple channels for the 
user concern, which discard the secondary interference. This 
work greatly shows that this algorithm improves the length of 
scheduling, channel utilization ratio (CUR) and average 
transmission scheduling. 

Al-Hemyari et al, [14] explained about constructing routing 
tree for centralized scheduling using multi-channel single 
transceiver system in 802.16 mesh mode. To obtain 
centralized scheduling, The WiMAX mesh networks based on 
IEEE 802.16 standard was developed. Here base station act as 
a coordinator for obtaining scheduling. But mostly, 
interferences from transmission of the neighboring nodes 
within mesh network cannot be avoided. By constructing the 
routing tree with multi-channel single transceiver system in 
the network the interference can be reduced completely. In 
this algorithm, there is a facility so that each node has one 
transceiver that can be tuned to any of the channels, which 
user decides. This scheme is used for eliminating the 



secondary interference that occurs in the network. The 
parameters of interference, hop-count, and number of children 
for every node, spatial reuse, fairness, load balancing, quality 
of services (QoS) and node identifier (ID) are considered. The 
results of analysis obtained shows that this proposed algorithm 
significantly improves the length of scheduling and the 
channel utilization ratio (CUR). 

Xiaohua Jia [15] illustrated a distributed algorithm of delay- 
bounded multicast routing for multimedia applications in wide 
area networks. The author considers the solution to attain the 
good route in a wireless network and for the performance 
measure for routing technique the spectral efficiency is 
applied. The merging of different perspectives from 
networking and information theory in the design of routing 
technique is considered as the main aim of this study. With the 
help of distributed manner, it is very hard to find the optimum 
route with the maximum spectral efficiency. The author 
presents two suboptimal alternatives such as approximately 
ideal- path routing (AIPR) technique and the distributed 
spectrum efficient routing (DSER) technique which is 
motivated by information-theoretic analysis. The 
approximately ideal- path routing technique needs the location 
information and it discovers the path to estimate an optimum 
regular path. The distributed spectrum efficient routing is 
based on Bellman-Ford or Dijkstras algorithms which are 
highly suitable for distributed implementations. The spectral 
efficiencies of approximately ideal- path routing technique and 
the distributed spectrum efficient routing for random networks 
approach is higher than that of nearest-neighbor routing in the 
low signal-to-noise ratio (SNR) regime and that of single-hop 
routing in the high SNR regime. In the temperate SNR regime, 
the spectral efficiency of distributed spectrum efficient routing 
technique is up to twice that of nearest-neighbor or single-hop 
routing. 

Jilin Le et al, [16] put fourth DCAR: Distributed Coding- 
Aware Routing technique in Wireless Networks. The usage of 
network coding is interested in recent year in order to enhance 
the performance of wireless networks. For example, the author 
proposed COPE which is a practical wireless coding system 
that illustrates the achievement of throughput gain by network 
coding. Still, COPE has two basic limitations: 

• The coding opportunity is crucially reliable on the 
established routes 

• COPE is restricted within a two-hop region in coding 
structure. 

To overcome these limitations, the author proposes certain 
suitable techniques. Especially, the author proposes the 
distributed coding-aware routing (DCAR) technique that 
provides 
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• The detection for existing paths between a given 
source and destination 

• The detection for possible network coding 
opportunities over much extensive network region. 

The capability to discover high throughput paths is very low in 
conventional techniques, whereas, the distributed coding- 
aware routing technique possesses the potential to discover 
high throughput paths with coding opportunities. The 
limitations of COPE technique is overcome by the DCAR 
technique which can detect the coding opportunities on the 
entire path. A new routing technique known as coding-aware 
routing metric (CRM) is proposed by the author in order to 
increase the performance comparison between coding-possible 
and coding-impossible paths. 

Honggang Wang et a/., [17] given Interplay between Routing 
and Distributed Source Coding in Wireless Sensor Network. 
In applications like real time target tracking and environment 
monitoring, the coding for multiple correlated sensors are 
related by the mission-driven wireless sensor networks (WSN) 
with the advancement in distributed source coding (DSC). The 
major potential opportunities in association with sensor 
networks are offered by the features of these DSC 
applications. For enhancing the network performance, this 
technique makes use of multirate transmissions. The author 
studied the techniques for interplay optimization between 
routing and DSC in WSN. Then a new multirate based routing 
technique for mission-driven DSC applications is proposed 
that significantly extends the lifetime of network. The 
proposed technique implements the rate assignment depends 
on the residual energy. In order to satisfy the end-to-end 
transmission rate constraint, information precision 
requirement and the energy constraints in the network for 
DSC, the proposed technique utilizes a joint rate and energy 
scheduling mechanism. Experimental results show that the 
proposed multirate based routing scheme achieves 
significantly longer network lifetime when compared to the 
conventional techniques. 

Wenjun Liu et a/., [18] proposed a Grid-Based Distributed 
Multi-Hop Routing Protocol for Wireless Sensor Networks. 
The main factors to consider while designing the wireless 
sensor network routing technique are high delivery ratio with 
low energy consumption and transmission delay. A grid-based 
distributed multi-hop routing protocol (GDRP) is proposed by 
the author for designing the wireless sensor network. At a time 
only one node is chosen as grid head per grid and the other 
nodes carry out grid head tasks by dynamically rotating them. 
The inter-grid communication utilizes the multi-hop routing 
pattern for reducing the consumption of energy by grid heads. 
According to the routing cost, distance and residual energy of 
neighboring grid heads, every grid head performs a distributed 



algorithm and chooses an optimal next h-hop routing path 
independently in grid-based distributed multi-hop routing 
protocol. Grid-based distributed multi-hop routing protocol 
balances energy consumption well, thus guides to a high data 
delivery ratio, low transmission delay and prolonged network 
lifetime that are shown in the simulation results. 

Yamamoto et al, [19] demonstrate the analysis of distributed 
route selection scheme in wireless ad hoc networks. The 
analyses are carried on the capacity region of ad hoc networks 
by means of optimal routing or scheduling. The network 
obtained will be scalable because the distributed network 
control techniques can be implemented without centralized 
information. Alternatively, the network performance can be 
highly degraded by selfish nodes. The network capacity will 
be the mainly affected performance factor. Depending on the 
distributed route selection by means of game theory, the 
author attempts to examine the acquired network capacity 
region. Experimental results shows that even with optimal 
routing every rational selfish node cannot find a unique route 
under the assumption that node know not only their own end- 
to-end throughput, but also of all other nodes as a result of 
their own. 

Tzu-Chieh Tsai et al, [20] shows Routing and Admission 
Control in IEEE 802.16 Distributed Mesh Networks. One of 
the challenging issues in wireless mesh networks is QoS. Here 
the author propose a new routing method by using SWEB as 
metrics system that is well-suited in IEEE 802.16 distributed, 
and coordinated mesh mode. Token bucket mechanism is 
proposed for the usage of an admission control algorithm. For 
controlling the traffic pattern in the information path, token 
bucket is used and this helps to estimate the bandwidth 
required by a connection. The hop count and delay 
requirements of real-time traffics are taken into account for 
estimating the bandwidth. Delay requirements of real-time 
traffics are the main concern for TAC designing, and avoid the 
starvations of low priority traffics. With the proposed routing 
techniques, the admission control algorithm and the inherent 
QoS support for the IEEE 802.16 mesh mode, a QoS-enabled 
environment can be established. At last, extensive simulations 
are performed to validate the algorithms, and show good 
performance results. 

Yajun Li et al, [21] projected a novel routing algorithm in 
distributed IEEE 802.16 mesh networks. The author had, 
proposed a novel distributed routing algorithm for IEEE 
802.16/WiMAX based mesh networks. Here this algorithm is 
not designed to eradicate the traffic delay completely; instead 
this will provide routes for traffic flows with have minimum 
end-to-end delays, so that traffic can be avoided. It says that 
the proposed algorithm is incorporated into the medium access 
control (MAC) layer to avoid traffic in the path. Each node 
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have a separate work to determines the next-hop nodes is free 
from traffic or not, if it finds the traffic in the path, then the 
information is allotted to next free path, else the information 
and attempts to forward packets in the very earliest slots. In 
addition to this algorithm, another one mechanism called loop 
cancelation is proposed to avoid being trapped in path loops 
and thus guarantees the accessibility of the author's algorithm. 
Thus the result reveals that the proposal system can 
considerably reduce the delay of traffic flows and also achieve 
load balance to a certain degree. 

Saha et aL, [22] put an idea of routing in IEEE 802.16 based 
distributed wireless mesh networks. Now- a-days Wireless 
mesh networks play a vital role in the field of 
telecommunication and network. Due to dynamic channel 
condition and lack of infrastructure Routing in distributed 
wireless mesh networks is seems to be a challenging fact in 
the area of networking. Here the author has introduced a new 
technique which provides the traffic distribution over a 
multiple path which helps in avoiding the delay of data in 
transmission path. The transmission delay over multiple hops 
can be calculated by applying queuing analysis on the 
intermediate nodes over the routes, and the analytical model is 
proposed to calculate such transmission delay. Simulation is 
carried out to support the analytical results for reducing the 
delay in the network. 

III. Conclusion 

This survey reviewed a lot of routing protocols for WiMAX 
based networks. Routing in WiMAX is an active area of 
research in which many techniques have been proposed that 
are facilitating to increase the throughput, minimizing the 
delay and offer further robustness over wireless channel. This 
survey presents the advantages and disadvantages of different 
routing algorithms for 802.16 WiMAX networks. This helps 
in choosing the best suited routing protocol for WiMAX 
networks. The various challenges for the routing in WiMAX 
are delay, long transmission scheduling, increasingly stringent 
Quality of Service (QoS) support and load balance and 
fairness limitations. All these challenges are not satisfied by 
many of the conventional routing techniques. With this 
analysis, the joint and distributed routing protocol can achieve 
all the qualities mentioned above. So the joint and distributed 
routing protocol can be utilized in 802.16 WiMAX network to 
improve better routing when compared to the conventional 
techniques. 
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Abstract 

In this paper, we devised a novel algorithmic 
approach for transmitting information through Fast 
Comparison Encryption (FCE) algorithm. The proposed 
scheme uses an algorithm name it as FCE which 
transforms the information into an encoded Godel 
Number Sequence (GNS) which results in a text. It will 
be reconstructed at the other end using the inverse 
process. 

Key Words: GNS, FCE. 

I. INTRODUCTION 

In simple terms, authentication is 
identification plus verification. Identification is the 
process whereby an entity identity, rather than one-way 
authentication, whereby only one principal verifies the 
identity of the other principal, is usually required. 
There are three main types of authentication in a 
computing system [4]: 

a. Message content authentication -verifying that the 
content of a received message is the same as when it 
was sent; in a computing environment. 

b. Message origin authentication - verifying that the 
sender of a received message is the same one recorded 
in the sender field of a message; and 

c. General identity authentication - verifying that a 
principal's identity is as claimed. 

Lack of security may exist when a volume of 
data is transferred from its source to the destination if 
no measure is taken for its security. For one reason or 
the other, most of the data being transmitted must be 
kept secret from others [5]. A very important reason to 
encode data or messages is to keep them security. In 
this paper, a novel method for message authentication 
is proposed. This is an efficient encryption scheme by 
using Godel number sequence (GNS) [1] and Fast 
comparison encryption scheme (FCE) [2], FCE uses 
any block cipher to encrypt only a few bytes of random 
seeds in each page of the database, and uses lighter- 



weight computation to encrypt the actual data in a 
information. The low 

overhead of FCE enables efficient comparison and, 
therefore, efficient indexing on the ciphertext. In this 
work, we specifically aim at encryption to ensure the 
security of on-disk data. FCE is specifically tailored to 
database systems in the following way, Comparison is 
fast, which facilitates the search of indices. 

II. GODEL NUMBER SEQUENCE 

A mathematical concept termed as Godelization 
[1] is used as an encoding scheme. The scheme of 
Godelization is explained as follows: Prime 
factorization theorem states that every positive integer 
greater than one can be factored into multiplication of 
primes, and this factorization is unique except for 
difference in the order of the factors. To factor a 
number 'n' is to write it as a product of other prime 
numbers: 

n=a x b x c 
Factoring a number is relatively hard compared to 
multiplying the factors together to generate the 
number. 

For any number 'n' of natural numbers, the 
Godel number sequence (GNS) is given by : 

GNS (n) = (xO, xl, x2,. . .xk) where 



n = (2 xU )*(3 xl )*(5 xZ )...((PrNo(k)) xK ) where PrNo(k) is 
the k th prime. 

90 =(2 1 )*(3 2 )*(5 1 ) 

GNS (90) = (1,2, 1) 

The Godel number sequence [1] will be 
encoded by using Fast comparison Encryption for 
improving the security of the information and reduce 
the complexity of the computation. Fast comparison 
encryption scheme is very light weight mechanism. 
This will be described in the following section. 
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A Inverse Godelization 

At the receiver side, there is a need to perform 
the inverse operations of Godelization technique to 
obtain the original data. It is the process of 
decompressing the string by replacing alphabets with 
digits and any substring KX is decompressed with K 
occurrences of X. The string obtained is in the form of 

GNS(ii)$GNS(i 2 )$ $GNS(in) which is the Godel 

String of the image and inverse Godelization is applied 
to the string to obtain the intensity values of the image 
which are calculated as GNS(i) = (xo,xi, xk) where i= 

2x0 * ^xl * ^x3 pxk 

III. FAST COMPARISION ENCRYPTION 

Encrypt a plaintext by using FCE [2]. Let's 
consider size of GNS of the original information is 'P' 
bytes. Convert it into bits while computing. Let's 
denote the key by 'K'(lbyte). It should be generate 
randomly from the input by using random permutation 
function (Perfun). |K| gives length of the Key. 

A Key Generation: 

Perfun : 

It is a random permutation function 
{1,2,...P}^{1,2,...P} 

Step 1 : Let Si is a starting bit of a key. 
Forj=l to P 

Sj = Perfun(j) mod |K| . It is in the range of 
[0,|K|-1]. 

Step 2: Sj is the starting bit of the Key 'K'. 

B. Encryption Algorithm 

Input: Plain Text (P bytes), randomly generated key K 
(symmetric), and random permutation function. 
Output: Cipher text (En) 

Algorithm 

Step 1 : Generate GNS 

Step 2 : Find Sj . 

Step 2 : Find K. 

Step 3: En <- Eni = K © Pi. 

Consider and generate the Godel number 
sequence (GNS) [1] to each byte of the plain text 
separately. Encrypt the GNS of the each byte by using 
fast comparison encryption scheme (FCE) [2] and send 
to the receiver. The cipher text byte (E n ) of the 
plaintext byte Pi is simply the bitwise XOR of K. This 
process is shown in the figure 1 . 



/ Input / 

/ r 


Generate 

GNS 


— ► 


Encrypt 
by FCE 


— n Output / 


/ / 








Iiguare 1 . Encryption Algorithm 

C. Decryption Algorithm 





Input : Ciphertext (Eni ), K. 
Ouput : PlainText 

Algorithm 

Step 1 : Decrypt the data. 

P<- Pi =K E^ 

Step 2: Apply reverse Godelization on T'. 

Consider and decrypt each byte of the 
Encrypted text by using FCE and get the original text 
by applying the reverse godelization. This is shown in 
the figure 2. 



/ Input / 


Decrypt 


/ (Encry / * 


by FCE ■ 


I* i 







Reverse 

Godelizatio 
n 



Output 
(Decryp 



figure 2. Decryption Algorithm 

IV. Proposed Methodology 

The main contribution of this paper is towards 
development of new algorithms which increase the data 
payload capacity than the regular methods and follows 
a layered approach for encoding and authenticity for 
more robustness. A technique termed as Godelization 
method [1] combined with FCE method [2, 3] is used 
as embedding technique. The proposed methodology is 
based on mathematical concept known as Godelization 
which is used as one of the encoding schemes. Later 
another improved technique based on a new 
compression technique known as Fast Comparison 
Encryption technique. The implementation results of 
the proposed method are shown in Figure 3 and Figure 
4. 
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V. Results 



<-■ C:\WINDOWS\system32\cmd.exe 



n Temp2 
;IZE ±n the bytes 

EBUG: FilelnputStrean ±s d : Spr jSsanple . doc 
j*EBUG: Length of d : Spr jSsanple . doc ±s 32256 



DEBUG: FilelnputStrean is d : \prj\sanple64 - doc 
DEBUG: Length of d : \prj\sanple64 _ doc is 64512 



)EBUG: FilelnputStrean is d:' 



)EBUG: Length of d: h 



Lain. doc is 278528 



DEBUG: FilelnputStn 
DEBUG: Length of d: h 



is d:\prj\PIF.doc 
jSPIF.doc is 57344 



hey: 
01100110 

DEBUG: FilelnputStrean is d:\pr, 
DEBUG: Length of d:\prj\enc.doc 



ic . doc 
5G5248 



figure 3. Encryption by FCE with GNS 



■I-" C:\WINDOWSlsystem321cmd.exe -Java Temp 2 



■EBU G : F x le I n put S t re am xs d : \prj \e n c - do c 
JEBUG: Length of d:\prj\enc.doc is 565248 



'EBU G = F x le I n put S t re am is d = Spr j \De c - do c 
JEBUG: Length of d:\prj\Dec.doc is 278528 



figure 4. Decryption by FCE with reverse GNS 
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ABSTRACT: This paper discusses the methods for 
getting proper geometric coordinates of a sample 
object that has to be rapid prototyped. The 
coordinates of the objects is obtained by using 
Radial Basis Function (RBF). The training is done 
with many sample objects. It is expected to have 
minimum distance traveled by the Rapid 
prototyping machine when the software follows the 
geometric coordinates produced by the RBF. 

Key words: Rapid Prototyping, Artificial 
Neural Network, Radial Basis Function. 



1. INTRODUCTION 



Rapid prototyping (RP) refers to a variety of 
specialized equipment, software and materials 
capable of using 3D computer aided design 
(CAD) [5] data input to directly fabricate 
geometrically complex objects. RP technologies 
have emerged as a key element of time with their 
ability to shorten the product design and 
development process [2]. This highly innovative 
and cost efficient technology has found 
applications in automotive, aerospace and 
medical equipment manufacturing, replacing the 
commonly used slower and less accurate manual 
methods of fabricating prototypes [4]. With 
advances in established technologies, materials 
and the introduction of new methods, selecting 
the right RP machine has become much more 
difficult and is one of the most important 
decisions to be made when employing any RP 
technology. This is vital in minimizing built 
time, cost and achieving optimal accuracy. When 



making this decision, the designers and RP 
machine operators should consider a number of 
different processes and specific constraints. This 
may be a difficult and time consuming task. 

The RP material flows through an 
orifice and comes out in the form of drops. The 
size of the drop is depending upon the speed of 
the wire comes out and solidification of material. 
For example, 1 mm size of drop is placed in 1 
mm size cube cavity to get the same size of cube 
after solidification in fraction of seconds[l]. The 
sides of the cube should be flat in all respects. To 
achieve this focus has been made on a method 
which can inform that how to make the above 
things with critical path method (CPM)[6]. Some 
products have been chosen with their 
applications, particularly in medical area. By 
considering all the parameters in developing any 
kind of object is being able to produce in shorter 
time without any difficulty[3]. 

2. MATERIALS AND METHODS 



2.1 Materials 

A schematic flow of the proposed work is 
presented in Figure 1 . 

Rapid Model: It is the end product that has to 
be rapid prototyped. 

Coordinates: There are various Coordinates 
measured from the RP model either through 
CMM/Reverse Engineering/existing drawing 
details. 
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Sizes: The length, width /thickness, 
breadth/height and other profiles are calculated 
from the coordinates. 

RBF: Coordinates and sizes of sample RP 
models are used as data for training the RBF 
neural network to obtain final weights that will 
be used for testing. 

Obtain format to meet RP M/c: The outputs 
of RBF are used as inputs for RP M/c converter 
where RP model will be developed. 



where 



Rp Model 



Co-ordinates 



Input to RBF 



Define sizes 



Obtain format to 
meet RP M/c 



Fig. 1 Schematic flow 



2.2 Methods 



The concept of distance measure is 
used to associate the input and output pattern 
values. Radial Basis Functions is capable of 
producing approximations to an unknown 
function T from a set of input data abscissa. The 
approximation is produced by passing an input 
point through a set of basis functions, each of 
which contains one of the RBF centres, 
multiplying the result of each function by a 
coefficient and then summing them linearly. 

For each function 't', the 
approximation to this function is essentially 
stored in the coefficients and centres of the RBF. 
These parameters are in no way unique, since for 
each function 't' being approximated, many 
combinations of parameter values exist. RBFs 
have the following mathematical representation: 



F(x) = c +Xc i <l>(||x-R i 



(1) 



c is a vector containing the 
coefficients of the RBF, 

R is a vector containing the 
centres of the RBF, and 



§ is the basis function 
activation function of the network. 

Implementation 

Step 1: Apply Radial Basis Function. 
No. of Input = 15 
No. of Patterns = 6 
No. of Centre = 6 
Calculate RBF as 
RBF = exp (-X) 
Calculate Matrix as 
G = RBF 
A = G T *G 
Calculate 
B = A -1 
Calculate 
E = B * G T 
Step 2: Calculate the Final Weight. 

F = E*D 
Step 3: Store the Final Weights in a File. 



or 



3. 



EXPERIMENT SET UP 



i=0 



Six RP models have been considered as 
examples for testing the RBF network. Each RP 
model has been labeled with Cartesian 
coordinates. The models have been developed 
using CAD software. The models are defined 
with definite number of points. The distance 
between points are calculated internally by the 
program. During training RBF, only the point 
coordinates are input in the input layer. The 
number of centers used is 6. The targets used is 
15. 

Table 1 presents 6 sample RP models under 
consideration. Table 2 presents number of points 
considered in this analysis for each RP model. 
Table 3a-c presents actual coordinates in mm for 
each point. The total number of points 
considered is 15 in each object. 
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Table 1 Sample RP models 









Table 2 Number of points in the RP model 



RP 



Number of points 



10 



12 



15 



Table 3a Cartesian coordinate 



PI 



P2 



P3 



P4 



P5 



50 



50 



50 







50 



50 







50 




9.08 



38.47 



47.55 



27.95 



23.77 



45.22 



27.95 



12.5 



37.5 



50 



21.65 



37.5 



43.30 



12.5 



43.30 



13.52 



32.66 



46.19 



13.52 



46.19 



32.66 



32.66 



46.19 



25 



12.5 



21.65 



12.5 



7.22 



50 



25 



25 



25 



25 



12.5 



7.22 



50 



4 RESULTS AND DISCUSSION 

The coordinates of the RP models are learnt 
by RBF. Table 4 presents the outputs of RBF for 
all the 6 RP models for the points pi, p2. Similar 
close outputs are obtained for points p3, p4, p5, 
p6, p7, p8, p9, plO, pi 1, pl2, pl3, pl4, pl5 

Conclusion: This work has made an attempt 
to train RBF with RP model coordinates. During 



the actual implementation, the RP model 
coordinates are given as inputs to the RBF to 
obtain the actual coordinates that helps in RP 
modeling. 
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Table 3b Cartesian coordinate 




P6 


P7 


P8 


P9 


P10 




X 


y 


z 


X 


y 


z 


X 


y 


z 


X 


y 


z 


X 


y 


z 


1 


50 





50 


50 


50 


50 





50 


50 


X 


X 


X 


X 


X 


X 


2 


9.08 





50 


38.47 





50 


47.55 


27.95 


50 


23.77 


45.22 


50 





27.95 


50 


3 





21.65 





12.5 





50 


37.5 





50 


50 


21.65 


50 


37.5 


43.30 


50 


4 


13.52 


46.19 








32.66 








13.52 





13.52 





50 


32.66 





50 


5 


X 


X 


X 


X 


X 


X 


X 


X 


X 


X 


X 


X 


X 


X 


X 


6 


X 


X 


X 


X 


X 


X 


X 


X 


X 


X 


X 


X 


X 


X 


X 



Table 3c Cartesian coordinate 
















Pll 


P12 


P13 


P14 


P15 




X 


y 


z 


X 


y 


z 


X 


y 


z 


X 


y 


z 


X 


y 


z 


1 


X 


X 


X 


X 


X 


X 


X 


X 


X 


X 


X 


X 


X 


X 


X 


2 


X 


X 


X 


X 


X 


X 


X 


X 


X 


X 


X 


X 


X 


X 


X 


3 


12.5 


43.30 


50 





21.65 


50 


X 


X 


X 


X 


X 


X 


X 


X 


X 


4 


46.19 


13.52 


50 


46.19 


32.66 


50 


32.66 


46.19 


50 


13.52 


46.19 


50 





32.66 


50 


5 


X 


X 


X 


X 


X 


X 


X 


X 


X 


X 


X 


X 


X 


X 


X 


6 


X 


X 


X 


X 


X 


X 


X 


X 


X 


X 


X 


X 


X 


X 


X 


X represents no coordinates 
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Table 4 RBF outputs 



X -coordinate of P1 




3 3.5 4 
RP model 



Y -coordinate of P1 



1 

0.8 

1 0.6 

Q. 

I 0.4 

cd 

c 

V 0.2 

o 
o 
o 

CO 

S -0.2 

c 

CO 

o -0.4 

CO 

E 

to -0.6 

LU 

-0.8 



-G- Target 
- H RBF output 



-B B B H- 



3 3.5 4 

RP model 



Z -coordinate of P1 



X -coordinate of P2 




Target 
RBF output 



B- - 



B 30 

E 



B El 



1.5 2 2.5 3 3.5 4 4.5 5 5.5 6 

RP model 



Y -coordinate of P2 



Z -coordinate of P2 




1 r- 

0.8 

w 

3 0.6- 

Q. 

CO 

1 0.2- 

o 
o 
o 

0C^ 



Target 
RBF output 



S 5 



-B B- 



-0.2 



E 

To -0.6 



-0.8 

-1 



1.5 2 2.5 3 3.5 4 4.5 5 5.5 

RP model 
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Abstract — This paper presents Herschel's gyrus 
auditory cortex slice registration using Echo state 
neural network (ESNN). Training the network is done 
with translation and rotational values of the selective 
points (feature points) from two images at a time 
(source and target images). The input layer is given with 
coordinates of the selective points of the source image 
and in the output layer; the labeling is the translation 
and rotational values of the selective points of the target 
image. ESNN is an estimation network which estimates 
the required registration information from the selective 
points of target and source image. The output of ESNN 
is compared with radial basis function (RBF). 

Keywords-Echo state neural network, functional 
magnetic resonance imaging (fMRI), Heschl's gyrus, 
auditory cortex 
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analysis[2][3]. A fundamental problem in medical 
image analysis is the integration of information from 
multiple images of the same subject, acquired using 
the same or different imaging modalities and possibly 
at different time points. One essential aspect thereof 
is image registration, i.e., recovering the geometric 
between corresponding points in multiple images of 
the same scene. While various more or less 
automated approaches for image registration have 
been proposed in the field of medical imaging and 
image analysis, one strategy in particular, namely 
maximization of mutual informational [5] , has been 
extremely successful at automatically computing the 
registration of 3-D multimodal medical images of 
various organs from the image content itself. 



I. INTRODUCTION 

The image registration [1] aims to find a 
transformation that aligns images of the same scene 
taken at different times, from different viewpoints. It 
has been studied in various contexts due to its 
significance in a wide range of areas, including 
medical image fusion, remote sensing, and computer 
vision. Medical image acquisition systems generate 
digital images that can be processed by a computer 
and transferred over computer networks. Digital 
imaging allows extracting objective, quantitative 
parameters from the images by image analysis. 
Medical image analysis exploits the numerical 
representation of digital images to develop image 
processing techniques that facilitate computer-aided 
interpretation of medical images. The continuing 
advancement of image acquisition technology and the 
resulting improvement of radiological image quality 
have led to an increasing clinical need and 
physician's demand for quantitative image 
interpretation in routine practice, imposing new and 
more challenging requirements for medical image 



II. MATERIALS AND METHODS 

A Neural Network Structures 

The Echo state neural network is used for 
learning the images. The number of neurons in the 
input layer is 4, and the number of neurons in the 
output layer is 6. 

Input layer description 

Node 1 = x coordinate of point in image 2 (target 

image) 

Node 2 = y coordinate of point in image 2(target 

image) 

Node 3 = x coordinate of point in image l(image 

to be registered with target image) 

Node 4 = y coordinate of point in image l(image 

to be registered with target image) 



Output layer description 

Node 1= vertical shift 
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Node 2= upward (1) or downward (2) 

Node 3=horizontal shift 

Node 4= left (1) or right (2) 

Node 5= angle with respect to axis passing 

through centre of the image 

Node 6= left (1) or right (2) 

The hidden layer has been trained with different 
number of nodes increasing from 2 neurons. 

The target values corresponding to x, y values of 
image 1 and image2 are calculated as follows 

TS=size (Directions, 1) 

for i=l:TS-l%l 
I=Directions(i,:); 
F=Directions(i+l,:); 

x=F(i,i)-i(i,i); 

Y=F(1,2)-I(1,2); 
if X==0 & Y==l 

D(i)=l; 
elseifX==0&Y==l 

D(i)=2; 

elseif X==-l & Y==0 

D(i)=3; 

elseif X==1&Y==0 

D(i)=4; 

elseif X==-l & Y==-l 

D(i)=5; 

elseif X==1&Y==1 

D(i)=6; 

elseif X==-l & Y==l 

D(i)=7; 

elseif x==l&Y==-l 

D(i)=8; 
end 



end 



Table 1 shows the direction of rotation among 
pixel coordinates of source and target image. The size 
of the image considered is 63 rows by 63 columns. 
The term 'T refers to target image and 'S' refers to 
source image. Curved arrow to the right is the 
clockwise direction and the curved arrow to the left is 
the counter clockwise direction. Table 1 shows the 
possible rotation of the pixel of source image to 
different location in target image. 

Table 2 presents 10 sample pixel coordinates that 
is used for training the network. For testing the 
network, the same sample points with another 10 
points (total 20 points) are presented. 

The description of Table is as follows. 

Column 1 = pattern number 

Column 2= x coordinate of points in target image 



Column 
Column 
Column 
Column 
Column 
Column 
Column 
Column 

with respect 
Column 

rotation 



3= y coordinate of points in target image 
4= x coordinate of points in source image 
5= y coordinate of points in source image 
6= shift in rows 

7= Upward or downward translation 
8= shift in columns 
9 = Horizontal translation 
10= Rotation of source pixel coordinate 
to corresponding target pixel coordinate 
11= Clock wise or counterclockwise 



Table 1 Rotation of source coordinates from 
Target image coordinates 
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Table 2 Patterns used for training and testing ESNN 




Input pattern 


Target pattern 




Target(actual) 


| Source(distorted) 


Translation (pixel) 


Rotation (degrees) 
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Pattern 
number 


X 


y 


X 


y 


Vertical 
shift 


Upward(l) 
Downward(2) 


Horizontal 
shift 


Left(l) 
Right(2) 


Ang/e 
rotated 


Direction 

CW(2)/ 

CCW(l) 


1 


3 


14 


1 


17 


2 


1 


1 


2 


3.05 


2 


2 


5 


41 


3 


42 


2 


1 


1 


2 


0.59 


1 


3 


22 


47 


19 


48 


3 


1 


1 


2 


5.4 


1 


4 


34 


47 


32 


48 
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1 


1 


2 


7.59 
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36 


18 


2 


1 








7.25 
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28 
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2.56 
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47 
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0.2 
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45 
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1.68 
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36 
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35 


62 
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1.88 
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10 


13 


57 


12 
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1 


1 


2 


0.33 


1 



B ECHO STATE NEURAL NETWORK 
(ESNN) 

An Artificial Neural Network (ANN) is an 
abstract stimulation of a real nervous system that 
contains a collection of neuron units, communicating 
with each other via axon connections. Artificial 
neural networks are computing elements which are 
based on the structure and function of the biological 
neurons. These networks have nodes or neurons 
which are described by difference or differential 
equations. 

Dynamic computational models require the ability 
to store and access the time history of their inputs and 
outputs. The most common dynamic neural 
architecture is the Time-Delay Neural Network 
(TDNN) that couples delay lines with a nonlinear 
static architecture where all the parameters (weights) 
are adapted with the back propagation algorithm. 
Recurrent Neural Networks (RNNs) implement a 
different type of embedding that is largely 
unexplored. RNNs are perhaps the most biologically 
plausible of the Artificial Neural Network (ANN) 
models. One of the main practical problems with 
RNNs is the difficulty to adapt the system weights. 
Various algorithms, such as back propagation 
through time and real-time recurrent learning, have 
been proposed to train RNNs; these algorithms suffer 
from computational complexity, resulting in slow 
training, complex performance surfaces, the 
possibility of instability, and the decay of gradients 
through the topology and time. The problem of 
decaying gradients has been addressed with special 
processing elements (PEs). ESNN possesses a highly 
interconnected and recurrent topology of nonlinear 
PEs that constitutes a reservoir of rich dynamics and 
contains information about the history of input and 
output patterns. The outputs of this internal PEs (echo 
states) are fed to a memory less but adaptive readout 
network (generally linear) that produces the network 
output. The interesting property of ESNN is that only 
the memory less readout is trained, whereas the 
recurrent topology has fixed connection weights. This 
reduces the complexity of RNN training to simple 
linear regression while preserving a recurrent 
topology, but obviously places important constraints 



in the overall architecture that have not yet been fully 
studied. 

The echo state condition is defined in terms of the 
spectral radius (the largest among the absolute values 
of the eigenvalues of a matrix, denoted by (|| || ) of the 
reservoir's weight matrix (|| W || < 1). This condition 
states that the dynamics of the ESNN is uniquely 
controlled by the input, and the effect of the initial 
states vanishes. The current design of ESNN 
parameters relies on the selection of spectral radius. 
There are many possible weight matrices with the 
same spectral radius, and unfortunately they do not 
perform at the same level of mean square error 
(MSE) for functional approximation. 



Input Lave 



Read-out 




Figure 1 Echo State Network (ESNN) 



ALGORITHM 

l.Read data 

2. Separate into inputs (datain) and target outputs 

(dataout) 

3. Initialize number of reservoirs 
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4.1nitialize (Input to hidden layer, output to hidden 
layer, hidden to hidden layer) 
5.1nitialize state vector 

• Calculate next state= tanh (input matrix * 
Input vector + hidden matrix * state +output 
matrix * target output) 

• Assign next state to present state and repeat 
step 5 and step 6 

• Find pseudo inverse for the state matrix and 
multiply with targets 

The recurrent network is a reservoir of highly 
interconnected dynamical components, states of 
which are called echo states. The memory less linear 
readout is trained to produce the output. 
Consider the recurrent discrete-time neural network 
given in Figure 1 with M input units, N internal PEs, 
and L output units. The value of the input unit at time 
n is u(n) = [ui(n), u 2 (n), . . . , u M (n)] T , 



The internal units are x(n) = [x x (n), x 2 (n), . . . , 

x N (n)] T (1) 
, and 
Output units are y(n) = [yi(n), y 2 (n), . . . , y L (n)] T 

(2). 

The connection weights are given 

• in an (N x M) weight matrix W = W-- 

for connections between the input and the 
internal PEs, 



in an N x N matrix W m = W~ 



for 



connections between the internal PEs 



in an L x N matrix W c 



W™ for 



connections from PEs to the output units and 
• in an N x L matrix W ° c = H^ acc for the 

connections that project back from the output to 

the internal PEs. 
The activation of the internal PEs (echo state) is 
updated according to 



Here, all f/s are hyperbolic tangent 

e — e 
functions — . The output from the readout 

e x + e 
network is computed according to 



y(n + 1) = f 0Ut (W 0Ut x(n + 1)), . 



where 



(4) 



fout / rout rout rout\ , . , 

= v/i > / 2 >~~>Il ) are the 0Ut P ut units 

nonlinear functions. Generally, the readout is linear 

so f out is identity [6]. The flowcharts for training and 

testing ESNN are given in Figure 2 and Figure 3 



III 



IMAGE REGISTRATION 



Characteristic points in image 1 (Source) and 
image 2 (Target) are defined. Characteristic points 
are important points through maximum alignment 
can be done. By this, unnecessary points choosing 
can be avoided and hence the ESNN can learn with 
less number of patterns. During training, the x, y 
coordinates of the characteristic points of image 1 
and image 2 are input in the input layer and the 
horizontal, vertical shifts along with angle are given 
in the output layer of ESNN. 

Implementation steps: 

Training 

Step 1: Identify characteristic points in image 1 

and image 2. 

Step 2: Calculate translation and rotation angle. 

Step 3: Generate training patterns with the 

information obtained in step 1 and step 2. 

Step 4: Train ESNN with training patterns. 

Testing 

Step 5: Present the same set of characteristic 
points and obtain values in the output layer. Find 
the error between obtained and actual values. 



x(n + 1) = f(W in u(n + 1) + Wx(n) +W back y(n)), (3) 

where f = ( i lt f 2 , . . . , f N ) are the internal PEs' 
activation functions. 
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Read a Pattern (I) and its Target (T) 
value 






i 


f 




Decide the number of reservoirs 






i 


r 


Decide the number of sides in the input layer = 
length of pattern 


i 


r 


Decide the number of sides in the output layer = 
number of target values 


i 


r 


Initialize random weights between input and 

hidden layer (Ih) hidden and output layer (Ho) 

and Reservoir (R), State matrix (S) 


i 


r 






Calculate F=Ih*I 






ir 






TH = Ho * T 






i 


r 






TT = R*S 





S = tan h(F+TT+TH) 



1 


r 


a = Pseudo inverse (S) 


i 


f 


Wout = a * T 



Figure 2 Flow chart for Training ESNN 



Read a Pattern (I) and its Target (T) 
value 


ir 


Decide the number of reservoirs 


▼ 


Decide the number of sides in the input 
layer = length of pattern 


ir 


Calculate F=lh*l 


ir 


TH = Ho * T 


ir 


TT = R*S 


i 


r 


S = tanh(F+TT+TH) 


i 


r 


Wout = a*T 



Figure 3 Flow chart for testing the ESNN 

IV. RESULTS AND DISCUSSIONS 

The fMRI have been obtained with standard 
setup conditions. The magnetic resonance imaging of 
a subject was performed with a 1.5-T Siemens 
Magnetom Vision system using a gradient -echo 
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10 20 30 40 50 



Fig.4 Heschl's gyrus, auditory cortex(target) 




10 20 30 40 50 60 70 

Fig.5 Heschl's gyrus, auditory cortex (10° 
rotated)(source) 




Fig.6 First alignment 




Fig.7 Second alignment 




Fig.8Third alignment 




Fig.9 Fourth alignment 
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Fig. 10 Fifth alignment 




Fig. 12 Seventh alignment 



Fig. 11 Sixth alignment 




Fig. 13 Sixth alignment 




Fig. 14 Eighth alignment 




Fig. 15 Final alignment 
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Fig. 16 Error metric 



Fig.17 MI for the alignment using ESNN 
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echoplanar (EPI) sequence (TE 76 ms, TR 2.4 s, flip 
angle 90 , field of view 256-256 mm, matrix size 64 
* 64, 42 slices, slice thickness 3 mm, gap 1 mm), and 
a standard head coil. A checkerboard visual stimulus 
flashing at 8 Hz rate (task condition, 24 s) was 
alternated with a sound (control condition, 24 s). In 
total, 110 samples (3-D volumes) were acquired. 

Figure 4 shows the Heschl's gyrus, auditory 
cortex (target) image slice. This image is rotated 
through 10° clockwise. This is treated as the source 
image (Figure 5). Figure 6 to Figure 15 shows the 
alignment of source with target at each iteration. 
Figure 16 presents the error metric of variational 
distance, Bhattacharya distance and Harmonic Mean 
and Figure 17 presents the mutual information for the 
alignment using ESNN. 

V. CONCLUSION 

This paper describes implementation of ESNN 
for registration of Heschl's gyrus, auditory cortex 
image slice. ESNN take least time to learn the 
alignment of characteristic points. 
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Abstract — The brain computer interaction could be the interface 
medium of the future, instead of using peripheral input output devices 
.So The brain computer interaction is a path way in which through 
digital EEG technique the brain signals of human subject have been 
recorded under different poses by using Digital 

Electroencephalograph (EEG) 2400NP instrument. Under 
experimental setup The subjects have given different expressions 
corresponding brain signals that have been recorded through a 
popular technique Digital EEG. An attempt has been done to 
correlate these results to the facial action coding System (FACS). 



Keywords- Bci, Eeg, Expression, Facial coding System. 

All standard paper components have been specified for 
three reasons: (1) ease of use when formatting individual 
papers, (2) automatic compliance to electronic requirements 
that facilitate the concurrent or later production of electronic 
products, and (3) conformity of style throughout a conference 
proceedings. Margins, column widths, line spacing, and type 
styles are built-in; examples of the type styles are provided 
throughout this document and are identified in italic type, 
within parentheses, following the example. Some components, 
such as multi-leveled equations, graphics, and tables are not 
prescribed, although the various table text styles are provided. 
The formatter will need to create these components, 
incorporating the applicable criteria that follow. 

I. Introduction 

The most important challenging application of brain 
computer interaction is to enables direct interaction between 
human and computer by directly receiving and transmitting 
signals to and from the brain of human subject. In the 
Computer system the Recognition of Facial expression of 
human subject is a great challenging task[l][2][3]. 

During expression recognition of human face there are so 
many complex issue arises like: neuromedical ,anatomical 
and psychological[4]. This is dependent on the social behavior 
of human.. Basically, one human being may have different 
expression under different -2 conditions. The human subject 
may have seven type of universal facial expressions like: 
happiness, sadness, fear, anger, surprise, disgust, and neutral 



so expression may consider as a vector in the seven 
dimensional field[5]. In anatomical FACS , the subject have 
been made to express expressions which will have anatomical 
aspects of the face. The facial expression are controlled by the 
brain so it is useful to correlate the expression with the 
brain[6]. Today's there are so many techniques available for 
direct contact with neural as Electroencephalography (EEG), 
Magneto encephalography (MEG) and FMRI[7]. 

II. BCI THROUGH EEG 

In brain computer interaction electroencephalography 
technique is an approximation of the cumulative 
electrical activity of neurons and is a measure of the 
brain's voltage fluctuations as detected from scalp 
electrodes. This technology is to augment human 
capabilities by enabling human subject to interact with a 
computer through a conscious and spontaneous 
modulation of their brain waves after a short training 
period. A brain computer interaction has been developed 
cerebral electric activity is recorded via the 
Electroencephalography: electrodes , attached on the scalp 
and measure the electric signal of the brain[8] . The 
signals are amplified and transmitted to the computer 
which transform them into device control command. The 
crucial requirement for the successful functioning of the 
brain computer interaction is that the electric activity in 
the scalp surface. 



Signal 
Acquisition 



Digitized 

"signal 



Signal Processing 



Feature 
Extraction 




BCI Application 



Figl. Brain computer Interaction 
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EEG frequency band 



There are Five rhythms as: 

1) Gamma T: 30-50 Hz 2) Beta (3: 13-30 Hz 3) Alpha a: 8- 

13 Hz 4) Theta 6: 4-8 Hz 5) Delta 5: 0.5-4 Hz. 

EEG Characteristics 

• It measures directly brain function. 

• It has a high temporal resolution, in the range of 
milliseconds. 

• The spatial resolution is in the range of centimeters for 
scalp electrodes, while implanted electrodes can measure 
the activity of single neurons. 

• Scalp electrodes are non-invasive while implanted 
electrodes are invasive. 

• The required equipment is portable. 

Experimental method and procedure 

During eeg test the subject should be prepared to give 
different type of facial expressions [9]. To conduct test , the 
scalp must be free from oil now tie The electrode cap over 
head with the use of electrode cream and finally check all the 
electrodes are connected with subject head, now said to 
subject to give six universal face expression[10] and on 
behalf of expression see the signal fluctuation and record it 
with the four different regions of the brain[l 1]. 



^!fiba~J" 



n^t 







Fig 4: The international 10-20 system 

The test was conducted for ten minutes and each participant 
was asked to give the different expressions while imaging the 
particular situations portraying different emotions 
simultaneously. The lower filter of the Neuro portable EEG 
was set at 1Hz, High filter at 70Hz; sensitivity at 7|uV, 
channels 20, sweep speed 30mm/sec, Montage set 1 for all 
experiment 13]. The resultant facial expressions of the 
participants were also captured photographically with the help 
of a digital camera. At the same time, the signals of the 
different regions of the brain were mapped. The different 
position of connection for signal monitoring is shown in fig 
4[14]. An example of mapped signal is shown in fig 5. The 
experiment was conducted in the sound proof environment. 



-f. x *> ''}<*>< — . 





Fig2: Electrode connected on head. Fig 3 . Four region of brain 
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In our experimental setup we have to selected fifteen male 
persons in the age group of 6 to 30 year with non-psychiatric 
history have been selected for our experiment. The electrode 
cap was placed on the different regions of each person EEG 
was recorded at sites of brain region for all frontal lobe and 
parietal lobe are put together FP2-F4(R), FP1-F3 (L) and 
Occipital lobe (C4-P4(R), C3-P3 (L)/Temporal lobe are kept 
separate. FP1, FP2, F3, F4, F7, F8, FTC1, FTC2, C3, C4, T3, 
T4, TCP1, TCP2, T5, T6, P3, P4, PZ, 01, 02,A1,A2. The fig 
4 contains details of connection for left and right portion of the 
brain in 10-20 international system[12]. 



Fig5 : signal of different -different brain area. 




Fig 6: Eeg signal window during test 
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The signals corresponding to different expressions were 
recorded and stored in the computer[15] .The four regions of 
the brain with left, right, front and back positions are identified 
corresponding to these positions, the average frequency and 
the average peak voltage are determined through commercial 
software available with the system The signals were recorded 
three times for each expression corresponding to all different 
seven subjects. The average values are mentioned in the Table 
l.The EEG 24/NP channel unit used was from Digital 
Neurocompact Medicaid System[16]. 



(IJCSIS) International Journal of Computer Science and Information Security, 

Vol. 9, No. 2, February 2011 
The description of the apparatus used as Electrode cable, GND 
plug wire, Phonetic wire, EEG conducting paste, Absorbent 
cotton wool LP., Sony digital camera HD 5X. 



III. Result and Conclusion 

The goal of the present research work is to represent the 
experimental work towards bci direction. In our experimental 
setup, A portable Electroencephalograph system has been used 
for brain. It has been found that in the human subject there are 
twenty six muscles are responsible for the overall movement 
of the face. The photographic expressions for each experiment 
are shown in fig (7). 




Happy 



Sad 



Angry Fear 

Fig 7. Six Facial Expression 



Disgust 



Surprise 



Brain 
Regions 


Position 1 


Position 2 


All Four 

Positions 

Result 


Expressions 


FL 


PL 


OL 


TL 


FL 
+PL+OL+TL 


Left 


Right 


Left 


Right 


Left & Right 


F 


PV 


F 


PV 


F 


PV 


F 


PV 


Avg. 
of 
F 


Avg. 
of 
PV 


l.Neutral 


24.5 


38.6 


22.5 


34.0 


21.5 


34.5 


20.5 


37 


23.0 


35.15 


2.Happy 


24 


44.5 


21.9 


42.1 


24.5 


36.2 


23.1 


32.9 


23.37 


42.75 


3.Sad 


22.5 


34.3 


26.3 


39.6 


20.2 


33.0 


22.1 


36.8 


25.35 


35.92 


4.Anger 


19.6 


35 


26.1 


35.6 


23.4 


31.1 


20.4 


33.4 


22.37 


33.77 


5.Fear 


19.7 


36.8 


18.9 


34.9 


16.6 


27.0 


16.9 


36.1 


19.11 


35.37 


6.Disgust 


19.5 


40.2 


20.1 


35.2 


18.5 


30.2 


19.6 


38.2 


19.42 


35.95 


7.Surprise 


13.1 


32.4 


17.9 


34.7 


13.9 


32.8 


13.8 


34.7 


14.67 


33.65 



Table 1 . Results for different facial expression 
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Abstract — Mobile Adhoc Network (MANET) is a collection of 
independent mobile nodes that can communicate to each other 
via radio waves. The mobile nodes that are in range of each other 
can directly communicate, whereas others need the aid of 
intermediate nodes to route their packets. These networks are 
fully distributed a and can work at any place without the help of 
any infrastructure. This property makes these networks highly 
exible and robust. Intrusion Detection System (IDS) is an integral 
part of any Mobile Ad-hoc Network (MANET). It is very 
important for IDS to function properly for the efficient 
functioning of a MANET. In this paper I evaluate the Co- 
operative game theory approach for intrusion detection in 
MANET by comparing it with the existing other approaches. My 
evaluation is concentrated both on Intrusion in Application layer 
and network layer. Network simulator NS-2.34 is used for the 
simulation of the intrusions in grid network. 



I. Introduction 

A mobile ad hoc network is defined as a collection of 
mobile platforms or nodes where each node is free to move 
about arbitrarily. Each node logically consists of a router that 
may have multiple hosts and that also may have multiple 
wireless communication devices. The vision of mobile ad hoc 
networking is to support robust and efficient operation in 
mobile wireless networks by incorporating routing 
functionality into mobile nodes. Such networks are envisioned 
to have dynamic, sometimes rapidly-changing, random, multi 
hop topologies which are likely composed of relatively 
bandwidth-constrained wireless links. A MANET may be 
susceptible to varying degrees of intrusion that include passive 
eavesdropping, broadcasting of false routing information, 
disrupting traffic flow, etc. The nodes in the network have to 
cooperate in analyzing the intrusion in MANET. Thus a co 
operative Intrusion Detection System as shown in Figure 1.1 is 



needed to detect any possible intrusions that occur in the 
network and generate an appropriate action. 




■■^Jfex^ Cluster H ead 
i ) Motile node sensing intrusions 
( ) Mobile node not sensing intrusions 



Fig 1.1 Grid Architecture Model. 

In this paper, the performance of the Cooperative Game 
Theory that uses Shapley value algorithm to analyze the 
contribution of each node in detecting the intrusion is evaluated 
and compared with Anomaly detection approach. This ID will 
constantly monitor the network and report the unusual behavior 
of the network back to the head nodes. It will detect the 
unusual behavior at the application layer and at the network 
layer an aggregate function that computes the severity of the 
attack based on the values reported by the nodes is introduced. 
The appropriate measure is taken based on the value of the 
aggregation function. 

Many papers have been submitted earlier on detecting and 
analyzing intrusions in MANET. Also some have proposed 
game theoretic approach for monitoring intrusions. A few of 
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them are mentioned below, A Cooperative Approach for 
Analyzing Intrusions in Mobile Ad hoc Networks by Otrok, 
H. Debbabi, M. Assi, C. Bhattacharya, P.Concordia Univ., 
Montreal consider the problem of reducing the number of false 
positives generated by cooperative intrusion detection systems 
(IDSs) in mobile ad hoc networks (MANETs). They define a 
flexible scheme using security classes, where an IDS is able to 
operate in different modes at each security class. This scheme 
helps in minimizing false alarms and informing the prevention 
system accurately about the severity of an intrusion. Shapley 
value is used to formally express the cooperation among all the 
nodes. A Game Theoretic Formulation for Intrusion Detection 
in Mobile Ad Hoc Networks by Animesh Patcha and Jung-Min 
presents a game-theoretic model to analyze intrusion detection 
in mobile ad hoc networks. We use game theory to model the 
interactions between the nodes of an ad hoc network. We view 
the interac- tion between an attacker and an individual node as 
a two player non-cooperative game, and construct models for 
such a game. A Moderate to Robust Game Theoretical Model 
for Intrusion Detection in MANETs by Hadi Otrok, formalized 
a nonzero-sum noncooperative game theoretical model that 
takes into consideration the tradeoff between security and IDS 
resource consumption. The game solution will guide the leader- 
ID S to find the right moment for notifying the victim node to 
launch its IDS once the security risk is high enough. 

To achieve this goal, the Bayesian game theory is used to 
analyze the interaction between the leader-ID S and intruder 
with incomplete information about the intruder. By solving 
such a game, we are able to find the threshold value for 
notifying the victim node to launch its IDS once the probability 
of attack exceeds that value. Simulation results show that our 
scheme can effectively reduce the IDS resource consumption 
without sacrificing security. Agah et al [4] suggested a game 
theoretic framework for defending nodes in a sensor network. 
Three schemes of defense are designed. In the first scheme the 
authors formulate attack-defense problem as a two-player, 
nonzero-sum, noncooperative game between an attacker and a 
sensor network. It is shown that this game achieves Nash 
equilibrium and thus leading to a defense strategy for the 
network. In the second scheme they use Markov decision 
process to predict the most vulnerable sensor node. 

In the third scheme they use an intuitive metric (node's 
traffic) and protect the node with the highest value of this 
metric. All the above work focuses on IDS in a mobile ad hoc 
network at network layer, where the cooperative game theory 
approach goes one step further and tries to provide IDS system 
using cross layer approach. In my work both application layer 
and network layer information are considered to provide IDS. 
At the application layer a grid architecture proposed by 
Vetriselvi et al [5] is considered, where the game theoretic 
approach to provide security to this architecture is included. 

Existing system: 

Mobile Ad hoc Networks are wireless networks that lack 
infrastructure. It is vulnerable to attacks. Intrusion attacks are 
of particular interest and concern to the nodes, because they 
seek to render target systems inoperable. Many schemes are 
evolved to detect the attack but we can't prevent the nodes 
from attack properly. Packet drooping: This approach is 
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presented using estimated congestion at intermediate nodes to 
decide if the intermediate node is not forwarding packets at the 
desired rate because of congestion or because of malicious 
behavior. It is unclear how statistical anomaly detection will 
succeed in the wireless domain, since it is a challenging one 
because of dynamic decentralization and a lack of 
concentration points where aggregated traffic can be analyzed. 
Selfish nodes: The cooperative enforcement mechanism based 
on a monitoring system, where the goal of this model is to 
detect selfish nodes and enforce them to cooperate. Each node 
keeps track of other nodes' cooperation using reputation as the 
cooperation metric. The System ensures that misbehaving 
nodes are punished by gradually stopping communication 
services and provides incentives for nodes, in the form of 
reputation, to cooperate. It is calculated by information 
provided by other nodes involved in each operation then also 
we can't stop the attack nodes, it is also less stable. Anomaly 
detection: If an anomaly is detected with weak evidence, 
because it uses a single layer of cluster heads. So a global 
detection process is initiated for further investigation about the 
intrusion through a secure channel. The limitations and 
drawbacks of this model are performance penalties and false 
alarm rates. Defending node: In a game theoretic framework, 
for defending nodes we use three schemes in a sensor network. 
In the first scheme the authors formulate attack-defense 
problem as a two-player, nonzero-sum, non cooperative game 
between an attacker and a sensor network. It is shown that this 
game achieves Nash equilibrium and thus leading to a defense 
strategy for the network. In the second scheme they use 
Markov decision process to predict the most vulnerable sensor 
node. In the third scheme they use an intuitive metric (node's 
traffic) and protect the node with the highest value of this 
metric. 



II. Design and working of the Game theory based 
IDS: 

A. The Grid Architecture 

Heterogeneity of the mobile devices can be integrated to 
form an infrastructure known as grid. A grid by definition is a 
system that coordinates resources that are not subject to 
centralized control. Grid consists of three categories of nodes; 
Consumer node CN- Node which requests for a service, 
Service Provider node SPN- Node which processes the service 
requested by the CN, Grid Head node GHN- Node which 
coordinates all the nodes in its grid. This GHN is responsible 
for the allotment of an appropriate service provider node to a 
node requesting for particular service based on parameters such 
as cost, service time, etc. VetriSelvi et al [5] have suggested a 
Grid architecture that efficiently makes use of heterogeneous 
resources in an ad hoc network. A trace based mobility model 
is used to handle the movement of the nodes. Trace Based 
Mobility Model (TBMM) captures the regularity in movement 
as a movement pattern. The nodes that are going to 
communicate exchange this trace information that provides the 
position of the destination and its associated stability time. 
With the help of the trace information as well as the resource 
information appropriate service is provided to consumer nodes. 
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Grid Formation and GHN Election 



Any SPN has the privilege to contest for the grid head. A 
SPN starts sending 'Hello' messages to all the nodes within its 
hop limit. A hop limit is specified so as to keep a check on the 
number of nodes in a particular grid and also the density of data 
traffic which will result due to this broadcasting of messages. 
The 'Hello' message contains the stability time of its sender 
and hop count. On receiving a 'Hello' message, any SPN 
which currently does not have a head checks if the sender's 
stability is greater than its own stability. If it is the case it 
simply stops broadcasting its own 'Hello' messages and starts 
broadcasting the newly received message to all the nodes in its 
hop limit range after storing the stability of the sender as the 
'GHN stability'. If not, it simply discards the message and 
continues to broadcast its own 'Hello' message. After finding 
the GHN, it sends 'Grid join' message to GHN. If a SPN node 
is currently functioning under a grid head and receives a 
'Hello' message, it checks to see if the sender's stability is 
higher than its head's stability and if true, it starts broadcasting 
the newly received 'Hello' message after storing the stability as 
'GHN stability'. Any CN on receiving a 'Hello' message 
simply forwards it. All the nodes store the first two highest 
stability times that they have received through 'Hello' 
messages. The node with the second highest stability is 
appointed as the' Secondary head' of the grid. Any node which 
gets elected as the GHN should periodically send 'Hello' 
messages to all the other nodes and if it fails to do so, it is not 
considered to be alive by the other nodes and a reelection takes 
place. 

Service Processing 

Any SPN joining a grid submits resource parameters, 
stability, position, type of service, service cost, etc to the GHN. 
A CN while requesting for a service states the type of service 
required and cost. The GHN maintains a Grid Maintenance 
Table (GMT), where in it stores the status of all the SPNs 
under it- their service parameters and their availability. On 
finding a suitable SPN for the service, it refers the SPN id to 
the requesting CN and assigns a job id to this service. The CN 
then sends a 'Service me' message to the allotted SPN which in 
turn completes the service and sends a 'Done' message to the 
CN and a 'Comp' message to the GHN indicating the 
completion of its assigned task. The CN sends an 'ACK' 
message to the GHN, acknowledging that it got the service 
completed by the SPN. The GHN now updates the SPN's 
status in the GMT. However, if an appropriate SPN is 
unavailable at a particular instant for a CN, it sends a service 
denial message prompting the CN to try later for the service 
request. 

Intrusions in Application Layer 

In the paper, two probable intrusions in the application 
layer - grid head which itself is found to be malicious and 
misbehaving service provider nodes are considered. 

1) Malicious GHN: A GHN sends a service busy / service 
denial message when to a requesting CN if it does not find a 
suitable SPN. The CN keeps track of the count of the BUSY 
messages sent by the GHN. Once it exceeds a predefined 
threshold limit, the CN reports a 'Bad Head' message to the 
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secondary head. Every time a service is being allotted to a SPN 
to a GHN, the SPN immediately sends 'busy' message to the 
secondary head. Similarly after the successful completion of 
service, the CN sends a 'complete' message to the secondary 
head. Thus the secondary head maintains the list of SPNs 
which are busy. When the secondary head receives the 'Bad 
Head' message from a CN, it checks if the SPNs are actually 
busy. If not, it generates a 'Ban' message and broadcasts to all 
the nodes. On receiving this message, all the nodes discard that 
node and no longer have it as their GHN and add that node's 
address to a list of banned nodes that they maintain after which 
a reelection takes place for contention to become the new grid 
head. 



2) Misbehaving SPN: After being allotted a specific SPN 
for its service, a CN sends a 'service me' message to the SPN. 
A malicious SPN on receiving this message does only half the 
service required and reports completion of the service to both 
the GHN and the CN. On discovering that the service was not 
fully completed, the SPN generates a report to the GHN stating 
the essential parameters like the SPN's id, job id, etc. The GHN 
increments its report count for the particular SPN node and 
waits till the count reaches a particular predefined limit after 
which it checks the coalitions against the reported node. If it 
happens to be a winning coalition the GHN adds the SPN to the 
list of banned nodes and broadcasts the message on to all other 
nodes in the network. 

Intrusions in Network Layer 

In the network layer, two highly probable intrusions - 
flooding and flow disruption caused by malicious nodes are 
proposed. Both of these intrusions are detected by the other 
nodes and a coalition is formed to report the intruder. 

1) Flooding attack: A malicious node starts sending 
innumerable route request/route discovery message to all the 
other nodes exhaustively. This affects the network bandwidth 
adversely and paralyses the network. This is resolved by using 
parameters like no. of control packets expected and received. 
For a certain time interval, the total no: of control packets 
received is counted and checked with the threshold limit. If it is 
exceeded then GHN is notified of the possibility of the attack. 
Grid Head then forms the coalition, calculates the attack value, 
checks whether it is a winning coalition and finds an intrusion. 

2) Flow disruption attack: A malicious node targets a route 
between a particular source and destination node and starts 
sending junk route discovery messages to all the nodes in that 
particular route. Certain nodes are randomly identified as the 
target nodes by the attacker nodes. These attacker nodes are a 
few among the nodes which route data packets from and to the 
target nodes. When the ACK messages for the target nodes 
reach the attackers, they drop the packets instead of forwarding 
them. This causes the route between the particular source and 
destination to be broken thereby disrupting the flow between a 
pair of targeted nodes. After a stipulated waiting time, the 
target nodes report to its grid head. On receiving the report, the 
grid head carries out the similar processing of checking for 
coalitions and spotting a winning coalition. 
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Fig 3.1 Block Diagram of Intrusion Detection 
System 

III. Performance Evaluation with Simulation. 

Simulation studies are carried out to evaluate the 
performance of IDS in grid architecture. For simulation the 
network simulator NS-2.34 is used. 

NS or the network simulator (also popularly called ns-2, in 
reference to its current generation) is a discrete event network 
simulator's is popularly used in the simulation 
of routing and multicast protocols, among others, and is heavily 
used in ad-hoc networking research, ns supports an array of 
popular network protocols, offering simulation results for wired 
and wireless networks alike. It can be also used as limited- 
functionality network simulator. It is popular in academia for 
its extensibility (due to its open source model) and plentiful 
online documentation. However, modeling is a very complex 
task in ns-2, given the need to learn scripting, modeling etc. NS 
was built in C++ and provides a simulation interface 
through OTcl,an object -oriented dialect of Tel. The user 
describes a network topology by writing OTcl scripts, and then 
the main NS program simulates that topology with specified 
parameters. 

Table 4.1 Parameters for the simulation of IDS 



Number of Nodes 


50 


Simulation Time 


500 Seconds 


Terrain Dimension 


(1000,1000) meters 


Mobility 


Random Way Point model 


Mac-Protocol 


802.11 


Routing Protocols 


AODV 



The performance is analyzed by increasing the number of 
reporters, increasing the service time, increasing the number of 
nodes reporters, increasing the service time, increasing the 
number of nodes in Grid Cluster and also the number of 
attackers 
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Fig 4.1 Detection Efficiency vs No. of. reporters 

The above graph shows performance evaluation of our 
proposed scheme compare to existing system. Where the no of 
reporters increases the detection efficiency also increases 
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Fig 4.2 Intrusion Detected vs Service Time 

The graph shows the variation in the number of intrusions 
detected to the increase in service time. 
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Fig 4.3 Detection Rate of ID in malicious SPN attack 
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Fig 4.4 Detection Rate of ID in flow disruption attack. 
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The above 4.3 graph shows our proposed scheme detect 
0.98 efficiency rate in malicious SPN attack. The 4.4 graph 
shows our proposed scheme detect 0.91 efficiency rate in flow 
disruption attack. 

IV. CONCLUSION: 

I have tested the performance of our system in both 
network layer and application layer with underlying grid 
architecture and in both cases the results have been positive. I 
have analyzed the simulation results and inferred that when 
there is more number of nodes participating to form coalitions, 
there are better chances of obtaining a good winning coalition 
thereby enhancing the efficiency of detecting intrusions. Also 
when there the number of nodes in a grid is larger, the 
detection time is lesser. I have also deduced that when the 
service time is lesser, there are more intrusions detected. Also 
Intrusion detection systems remain efficient in detecting all 
attacks with varying number of attackers. These detections are 
done by using the shapely value concept of game theory. The 
nodes of a winning coalition are enabled to get an equal share 
of the total gain and hence increase their reputation. Our 
proposed system is more efficient in detection. 

References 

[1] A Cooperative Approach for Analyzing Intrusions in 
Mobile Ad hoc Networks by Otrok, 

H. Debbabi,M. Assi,C.Bhattacharya,P. Concordia Univ., 

Montreal appeared Distributed computing system 



(IJCSIS) International Journal of Computer Science and Information Security, 

Vol. 9, No. 2,2011 
workshop,2007. ICDCSW' 07 27 International Conference on 
22-29 June 2007. Issue Date: 22-29 June 2007 
[2] A Game Theoretic Formulation for Intrusion Detection in 
Mobile Ad Hoc Networks by Animesh Patcha and Jung-Min 
Park published in International Journal of Network Security, 
Vol.2, No.2, PP.131-137, Mar. 2006. 

[3] A Moderate to Robust Game Theoretical Model for 
Intrusion Detection in MANETs by Hadi Otrok, Noman 
Mohammed, Lingyu Wang, Mourad Debbabi and Prabir 
Bhattacharya published in IEEE International Conference on 
Wireless & Mobile Computing, Networking & 
Communication 

[4] Agah. A, Das. S and Basu. K, "Intrusion Detection in 
Sensor Networks: A Non-cooperative Game Approach", Proc. 
3rd IEEE International Symposium on Network Computing 
and Applications, IEEE press, 2004. 

[5] VetriSelvi V, Shakir Sharfraz and Ranjani Parthasarathi 
(2007), "Mobile Ad Hoc Grid using Trace Based Mobility 
Model", Proceedings of the International Conference on Grid 
an Pervasive Computing (GPC2007), Publishenpringer- 
Verlag, LNCS 4459, France, May 2007, pp. 274-285. 
[6] Xia Wang "Intrusion Detection Techniques in Wireless Ad 
HocNetworks", IEEE 2006 - Proceedings of the 30th Annual 
International Computer Software and Applications Conference 
(COMPSAC2006). 

[7] Seema Bandyopadhyay and Subhajyoti Bandyopadhyay "A 
Game Theoretic Analysis on the conditions of cooperation in a 
Wireless Ad hoc Network", University of Florida, FL, USA, 
2006. 



220 



http://sites.google.com/site/ijcsis/ 
ISSN 1947-5500 



(IJCSIS) International Journal of Computer Science and Information Security, 

Vol. 9, No. 2, February 2011 



Hierarchical Route Optimization by Using memetic 
alogirithm in a Mobile Networks 



K .K. Gautam 
Department of Computer Science & Engineering 
K.P. Engineering College, Agra-283202- India 
E-mail-: drkkgautam@gmail.com 



Dileep kumar singh 

Department of Computer Science & Engineering 

Dehradun Institue of Technology, Dehraun-India, 

E-mail- : paras_dileep 19@rediffmail.com 



Abstract-The networks Mobility (NEMO) Protocol is a way of 
managing the mobility of an entire network, and mobile internet 
protocol is the basic solution for networks Mobility. A 
hierarchical route optimization system for mobile network is 
proposed to solve management of hierarchical route optimization 
problems. In present paper we study hierarchical Route 
Optimization scheme using memetic algorithm(HROSMA) The 
concept of optimization- finding the extreme of a function that 
maps candidate 'solution' to scalar values of 'quality' - is an 
extremely general and useful idea. For solving this problem, we 
use a few salient adaptations, and We also extend HROSMA 
perform routing between the mobile networks. 

Keywords-Route Optimization, Memetic algorithm personal area 
networks, NEMO, IP. 

INTRODUCTION 

In the trend of ubiquitous computing, many electric 
appliances, more electronic devices capable of integrating with 
wireless communications are being added. The mobile internet 
protocol (IP) working group within the internet engineering 
task force (IETF) has proposed the mobile IP protocol [1], [2] 
to support host mobility in IP based networks. The mobile IP 
aims at maintaining internet connectivity while a host is 
moving. The networks mobility (NEMO) protocol is a way of 
managing the mobility of an entire network, viewed as a single 
unit, which changes its points to attachments in the internet 
[3]. Such an internet will include one or more mobile routers 
(MRs) that connect it to the global internet. A mobile network 
can connect it to the global internet. 

A mobile network can have a hierarchical structure; 
in this paper we propose a hierarchical Route Optimization 
scheme using memetic algorithm (HROSMA) for mobile 
network. 

In addition to routing inefficiency, other criteria are 
important in designing a route optimization scheme for mobile 
networks. The concepts of network mobility were introduced 
to reduce the signaling overheads of a number of hosts moving 
as group. 

The NEMO basic support protocol uses a 
bidirectional tunnel between the home agent (HA) and the 
mobile networks needs (MNNS) from sending all there 
location registration simultaneously when the MR changes its 
point of attachment. The characteristic is called mobility 
transparency, which is a very desirable feature for the route 
optimization scheme. 



Mobile networks can here very complex form of 
hierarchy e.g. Mobile networks in a mobile network visiting 
mobile nodes(VMNS) in mobile networks and so on. This 
situation is repaired as nested mobile network. 

NEMO ARCHITECTURE 

When a mobile network moves from one place to 
another, it changes its points of attachment to the internet, 
which also makes changes to its reach ability and to the 
Internet topology. NEMO (Network Mobility) working group 
has come up with NEMO support solution. NEMO support is 
a mechanism that maintains the continuity of session between 
mobile networks. Node (MNN) and their correspondent nodes 
(CN) upon a mobile Router's change of point attachment. 
NEMO support is divided into two parts: 

1. NEMO Basic Support 

2. NEMO Extended Support 

NEMO Basic Support is a solution for persevering 
session continuity by means of bidirectional tunneling 
between Home Agent (HA) and a mobile network. And 
NEMO extended Support is a solution for providing the 
necessary optimization between arbitrary Mobile Networks 
Nodes and correspondent Nodes, including routing 
optimization [5]. There has not been much research done with 
the NEMO extended Support Protocol. 

A mobile Network is composed of one or more IP 
subnets viewed as a single unit. The Mobile Router is the 
gateway for the communication between the mobile network 
and the internet. 

An Access Router (AN) is a router at the edge of an 
access network which provides wireless link to mobile 
nodes. A link is simply a physical medium via which data is 
transformed between multiple nodes. A Home Link is the 
linkattached to the interface at the Home Agent on which the 
Home Prefix is configured. Any Link other than Home link is 
foreign link.NEMO link is the link within the mobile network. 

A Mobile Router has two interfaces :- 

Ingress Interface: The interface of the MR attached to 
a link inside the mobile network. 

Egress interface: The interface of the MR attached to 
the home link if the MR is at home and to foreign link if it is a 
foreign network. 
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NEMO Basic Support protocol is an extension to the 
Mobile Ip version 6 (MIPv6) [2]. MIPv6 is a version of 
Internet Protocol (IP) that supports mobile nodes. 



MOBILE ROUTERS 
A Mobile Router is a router that can change its point 
of attachment to the network by moving from one link to 
another. All the Internet traffic to and from the mobile 
network passes through the Mobile Router. Therefore, Mobile 
Router has to perform certain operations to be able to support 
network mobility. 

HROSMA 
For the hierarchical Route Optimization scheme 
using tree information option (HROSTIO) we use an assistant 
data structure and call it MNN-CN(mobile network node- 
corresponding node) list .It is stored at MRs and records the 
relationship of the MNN-C 

BASE STATION 
In more environment, a cell that Is geographical 
region unit is geographical region unite is covered by the radio 
frequency of a base station. Each call is controlled by a BS 
which has a fixed connection to a BSC (or RNC). In mobile 
network infrastructure element such as base station controller 
(BSC), wired links and mobile switch centre (MSC) are 
employed to provide and maintain essential service; hence the 
operation interruption of a network component affects overall 
or partial network services. 

A radiation antenna is classified as omni directional 
and directional with an ommnidirectional antenna, a single 
frequency spreads out in all directions of 360 coverage. A cell 
is directional antenna with each different set channel. 

SYSTEM STATE OF BASE STATION 
The BS system, including antenna parts, cannot 
provide partial or whole service function for coverage cell 
when single or more fatal failures occur in the BS system. In 
this paper, we consider that system failures are caused by key 
distribution method. For example by interrupt sequence 
mishandling, overall system operation falls into failure state 
because of unanticipated handled interruption to a component 
of the system. 



MULTIOBJECTIVE OPTIMIZATION (MOO) 

An unaccompanied multi objective optimization 
problem is a example of route optimization for mobile 
network. Because mobile moves as a single unit with one or 
more mobile routers that connect it to the global internet. We 
defined this problem as 

"Minimize" z = f(x) 

Where f(x) = (fl(x), f2(x) fn(x)) 

Subject to xGx 
Z2 minimize PF {0} 

Fig -3 an example of multiobjective optimization problem 
with mobile search space (MSS) x, as vector fitness function f 
that maps solution in x to objective vector made up of two 



component 
minimized. 



(mobile routers) 'costs' Z 1 and the Z 2 



Here if we define 
a=mobile router - 1 
b=mobile router-2 



And 
l=access router -1 
2=access router-2 




f : x ► z 



PERSONAL AREA NETWORK 

A mobile network can have a hierarchical structure 
e.g. a mobile network within another mobile network. This 
situation is referred to as nested mobile network. A personal 
area network (PAN) may travel a vehicle, which also contains 
a mobile network of larger scale fig 1 illustrate a simple larger 
Scale. MR-1, MR-2 are attache their own home link. A 
wireless personal area network moves as a single unit with one 
or more mobile routers that connect it to global internet. 



Fig-3 



This fig is also defined a routing inefficiency for the traffic 
management and disigend a imported rout optimization 
schemes for traffic management of mobile networks. The 
concept of traffic management for the network mobility was 
introduce the signaling over heads of a number of hosts 
moving as a group as MRs. 
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MEMETICALGORITHM APPRO ACH:- 
The impressive record of Memetic Algorithms 
producing high quality solution in combinalarial optimization 
and in real -world application (e.g. see page 220[5]) is some 
times cited as a testament of their m herent effective - ness of 
robustness as black box search. However, since the advent of 
the no free lunch theorems [6,7,8] 109,19, 2, we know that 
MA, s like any other search algorithm , are only reality good 
to the extent to which they can be "aligned" to the specific 
features of a route optimization problems in mobile networks 
. None the less, MAs, like there fore bears evolutionary 
algorithms (EAs), do have unassailable advantage over other 
more traditional search techniques: that is their flexibility. 
This flexibility has important advantage, as has to solve 
mobile route optimization problems: one is to choose some 
traditional techniques. And them simplify or otherwise other 
the problems. 

As in any other optimization scenario as route 
optimization problems, we should know that the out set what 
is a desirable outcomes; the Memetic Algorithm frame work 
proposed above required. The operators and procedures be 
selected based on their current success abs. 

When a mobile network moves from one place to 
another it change its point attachment to the internet, which 
also makes changes to its reach ability and to the internet 
topology. 

PERFORMANCEMEASURESIN MAs FOR MOO 

If one is developing or using an algorithm for 
optimization it almost goes without saying that there should be 
some way to measures its performance. In MOO the situation 
is the same regarding the time aspect of performance 
assessment but the quality aspect is clearly more difficult, the 
extensive array of existing met heuristic , issues and methods 
reviewed in the section above gives a richer basis from which 
to design new MAs than do the existing MAs for MOO 
themselves. In a typical cellular network, the area of coverage 
is after geographically divided into hexagonal cells. The call is 
the basic unit of a cellular system. 

In recent years, Muscat and krasnogor have provided a guiding 
manifesto for putting the "Memetic" back in Memetic 
algorithm [9, 10] advocating. 



Candidate MA framework for MOO:- 
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7: SAMN":=LocalSearch(SAMN",I s_sched(succ(LS))) 
8: MN: ^Replace (PUC",rep_sched(s ucc(RED))) 
9: A: ^Reduce (Nandom(AUSAMN) ,red_sched(succ(RED))) 
10: end while 

1 1 : MN: =Randomlmmigrants(P,imm_sched(succ(IMM))) 
13: return (A) 

Here we represent a Algorithm, we put forward a 
simple framework that could serve as a guide for making a 
more Memetic MA for MOO. In linel, MN (Mobile 
Networks) of solution is initialized. As usual this procedure 
may be simply random or it may employ some heuristics(s). 
Line 2 sets the archive A to the no dominated solution from 
MN. Thereafter, the main loop of the MA begins line 4 sets up 
an inner loop in which a stagnation criterion is checked. This 
should be based on some memeplex which monitors progress 
in diversity, proximity, and /or some other criteria. Line5-9 
gives a very high level description of the update of the MN 
and archive. Five different 'schedulers' are employed, 
basically corresponding to mating selection, reproduction, 
lifetime learning, survival selection and update of the archive, 
respectively. Each scheduler chooses from a memeplex of 
operators, based on estimates of the current success of those 
operators. E.g. in line 5, Select from is the operation of mating 
selection, the domain of which is the union of the MN and 
archive, and co-domain is a Small Area Mobile Networks 
(SAMN), the selection is controlled by the scheduler, 
sel_sched, which use a success measures, succ, to choose one 
operators for the set SEL, of currently available operators for 
selection. Notice that MN and A are potentially of variable 
size, in this scheme. In line 1 1, the MN is updated using some 
immigration policy to rerelease it from stagnation, the archives 
of no dominated solution are returned in line 13. 

The frame work proposed is rather broad and actually 
instantiating it requires us to consider how we should resolve 
many choices, including those considered in the following 
sections, at the very least. Table 1 summaries some of the MA 
elements / configuration choice to consider. 



1. MN: = initialize(MN) 

2. A: =Nondom(MN) 



Algorithm Candidate MA framework for MOO 



MN: = Initialize (MN) 
MN: = Nondom(MN) 



3 : while stop criterion not satisfied do 

4: while stagnation _criterion 

satisfied do 

5: SAMN: =SelectFrom(PUA,sel_sc hed(succ(SEL))) 

6: SAMN: =Vary (SAMN,var_sched (succ(VAR))) 



not 



CONCLUSION 

In this paper, we have proposed a scheme for mobile 
service use of BS system and memetic alogithm. The 
survavility of Route optimization scheme in nested mobile 
network modifying the process of Memetic Algorithm. 
Hierarchical Route Optimization scheme optimization scheme 
in mobile network modifying the process of memetic 
alogrithm. And hence the NEMO basic support protocol needs 
to be extended with an appropriative route optimization 
scheme, the optimization scheme to we easily solved by 
MEMETIC ALOGRITHM . We propose a scheme can 
achieve the hierarchical Route Optimization scheme using 
memetic alogrithm (HROSMA) for route optimization 
environment. 

And hence the basic support protocol for Hierarchical 
Route Optimization scheme optimization scheme of Route 
optimization scheme for mobile network needs to be 
extended with an appropriative route optimization scheme, we 
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propose a scheme can achieve the mobile route optimization 
environment. It may get a survivability scheme. 
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Abstract-A performance of call admission control scheme in 
different classes of calls may have different bandwidth 
requirement, different request call holding timings and residence 
times. At any time, each call of the network has the capability to 
provide service to at least a given number of calls for each call of 
calls. When the multi beam directional antennas are introduced 
in this system, then we shall have many challenging problems. In 
this system, then we have many challenging problems. 

In this paper we propose a noble network protocol to 
carefully examine performance of call admission control for 
multimedia network, for each class of new and handoff this 
mobile network. 

Keywords-Call admission control. Directional antennas, 
Multibeam Access point, Multimedia Network, Multiple input 
multiple outputs (MEMO) 

INTRODUCTION 

With the development of Multimedia from the stand 
point of a system Administrator, this property provides an 
alternative for resource planning, especially for bandwidth, 
allocation/ reallocation in wireless multimedia networks. The 
system may need to block incoming users of all of the 
bandwidth Has been used up to provide the highest QoS to 
existing users. And in mobile network local access, there is an 
increasing demand to improve throughput and energy 
efficiency for data transmission between terminals and an 
access point. Multibeam smart antennas bring two major 
benefits, spatial reuse and antenna gain, both of which are 
useful in improving the mobile communication efficiency. 
Therefore it is of great interest to consider the use of 
multibeam smart antenna in a mobile network, especially 
anthraces point. The access point is generally more powerful 
with less physical constraint than mobile terminals. 

Recently these have been some research devoted to 
optimizing the mutual information of a MIMO system with 
interference [4]-[8]. For example, in [7-[9], signaling methods 
we developed to optimize the mutual information of a MIMO 
system where the user is one cell suffers from the co-channel 
interference from the users in other cells. In [10], the problem 
of non-reciprocal interference was recognized in case of 
adaptive modulation in general. A simple feedback method 



was developed to compensate for it is a fate single antenna 
transmission scenario. 

The paper [11-12] showed that these two variables 
are dependent and derived a new degradation ratio. Also they 
argued that another new performance metric, the frequency of 
switching between different quality levels, should also be 
taken into accounts because users may feel more disturb by 
frequent switches quality levels than by poor but steady 
quality. 

To design a cellular mobile network, comparison 
needs to be made between the performance measures of 
different protocol. Mobile which provides a multiple call 
analysis with an MIMO is developed for the majority of 
networks, in this paper we formulated and steady an adaptive 
performance of call admission control for Multimedia mobile 
network with Multiple cells , Multiple classes of calls and 
fairness consideration . the cellular networks here is 
characterized by the requested call holding time, call residence 
time and new call arrival process as well as capacity restriction 
on the number of calls due to limited bandwidth. And here we 
present the system model and identity the design challenges. 
Then we present our proposed protocol tree MEMO 

CALL ADMISSION CONTROL 
The handle a Multiservice for Multimedia Network 
(MMMN) it is very important to employ the call admission 
control mechanism. First call admission control is a critical 
step for the provision of QoS quarantined service because it 
can prevent the system capacity from being overused. Second, 
call admission control can help the MMMN provide different 
classes of traffic load with different priorities by manipulating 
their blocking probabilities. In a MMMN System, CAC is 
used to accept or reject connection request based on the state 
information and the QoS requirement of these connection. 
Now we consider a Multimedia Mobile communication 
networks. Consists of J connected cells. There are U classes of 
calls (telephone, video, etc., but for convenience we shall 
call all of them calls). The other assumptions and notations for 
this wireless mobile network are as follow 
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(1) The required bandwidth of class u calls (u=l 
...U) is form the minimum bandwidth 

requirement Jj . to the maximum bandwidth 

requirement g ju (0< Jj ju < B ju ) in cell j 

(j=l, 2 J), if a call gets the maximum 

bandwidth for communication, it gets the worst 
but acceptable QoS from the network. 

(2) Cell is consists of j^" . channels. To be fair to 

each class of call sin each cell, cell j reserve 
K . B . ( K . >0) channels, for class u calls. 

Notice that only the number of channels, not 
individual channels are reserved [13]. This 
implies that any time cell j will have the 
capability to provide the minimum QoS level 
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service for at least K 



ju 



class u calls 



simultaneously .Please refer to the conclusion 
part for an explanation to a related situation. 
(3) To give priority to handoff calls, a threshold 

value (T . » K . ) in cell j is predetermined 

and specified for class u calls. This threshold 
value means that a class u new call request is 
admitted if and only if, (a) the number of class u 

calls in cell j less than T . ,(b) there is at least 

b . available channels in cell j after possible 

degradation QoS for other existing calls (see II 
B for the degradation description),and (c) the 
constraint in item 2 above is not violated after 
admitting this class u new call. A handoff 
request is admitted provided there are minimum 
required bandwidths for this call after possible 
degradation QoS for other calls and that the 
constraint in item 2 above is not violated after 
admitting this class u handoff call. Clearly, 



JU 



should satisfy ^ T ju b ju < M . . 



(4) Class u new calls are generated in cell j 
according to a Poisson process with rate 2 . , 

1 U. The requested call connection time 

(RCCT). Which is defined as the total length of 
time that a call initial requests to use a channel, 



of a class u new call at cell j, H 



ju 



is 



exponentially distributed with mean 1/ H . . the 

cell residence time, which is defined as the 
length of a time a call stays in the cell and which 
is depends on the velocity and the direction of 
the mobile terminal, of a class u call in cell j, 



R • ,is exponentially distributed with mean 



1/r 



ju ' 



(5) The probability that a class u call moves from 
cell j to a neighboring cell k, given that it moves 
to a neighboring cell before the call is completed 

,is P ju ku, where ^ =1 PjuM=l. 

(6) As desired above, a class u new call at cell j 
gains at least b • channels for communication if 

it arrives and finds there are less than T . class 

u calls in the cell. There is at least b . channels 

available and the constraint in item 2 is still not 
violated after admitting this class u call. If any 
of these condition is not satisfied ,then the new 
call will be cleared from the network with 

probability r . ,0 or will push out a class u call 

in the cell to a neighboring cell , say cell k, with 

probability r . ,ku>0 is possible only when j and 

k are neighboring cells , refer to [2] for a similar 
protocol. It is worthy to point out that the 

specific values of the probability r . ,ku for 

different system will depend on the signal to 
noise ratio at cell j and cell k for class u calls. 

(7) A class u handoff call to cell bj is admitted for 

connection when it arrives and finds at least b . 

channels available and the constraint in item 2 
above is not violated after admitting this class u 
handoff call. Otherwise, the handoff call will be 

cleared from the network with probability r . , 

or will be admitted in cell j by the system in 
terms of pushing out a class u call to a cell k 

with probability r . , ku. 

Note that the protocol above gives priority to handoff 
calls as well as. Fairness for each class of calls. The key 
differentiation of the priority comes in form the threshold 

value T • and the main differentiation of the fairness comes 



from the reservation number K . . The use of probability r 



JU 



ju > 



ku can model several network features. (1) if a call is blocked 
at one cell, it may not be blocked by the network. 
This is possible in practice, because cells often overlap to 
ensure complete coverage of the region and when a call is 
attempted, the mobile may be situated near the boundaries of 
two cells and it may be close to a third or fourth cell. A 
handoff attempt is possible to these neighboring cells when the 
first attempt is blocked. The protocol is called directed retry in 
[13]. (2) if a call arrives to a cell and finds all channels busy, it 
is possible to borrow a channel does not interfere with the 
existing calls. This is called simple borrowing strategy in [13]. 
Some related borrowing concepts can be found in the hybrid 
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channel assignment strategy [7]. In this paper we consider the 



case that r . ,ku= 



JU 



P,u. k u/(b iu +r. u ), thusP(R ju < 



H • )P • ,ku • = r . ,ku. An intuitive explanation for this 

assumption is that a pushed out class u call to a cell follows 
the same protocol as those class u calls at cell j that move out 
of the cell before finishing the call. We remark that the 
production form solution presented in this paper fails if r does 
not take this form. 
Example: Suppose there are three classes of calls in a cell and 

the capacity in the cell is 15,30,45,60 the minimum and 

maximum numbers of channels needed by the three classes of 
calls are both 1,2,3,4 and 2,4,6,8 that is, 

h=b 2 =b 3 =1 > 2 > 3 ^B 1 = B 2 = B 3 
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define a CAC policy in the following: let B the overall 
beam width resource (for subscribes k, B=DBk) and let 
M' be the number of traffic classes, then we can then 

define the system state vector as n= n (n 1 , n2, 

n3, nm) of class I connection in the system. 

Assuming that the beam with requirement of classes I 
connection is fixed to bi, then beam width requirement 
vector is represented by b= (bl, b2...bm). Therefore an 
incoming will be accepted if sufficient beam width 
resources are available. 



....=2,4,6 where for simplicity we have dropped the 

cell index. This state space for this cell is 



2n 1 +2n 2 < 15 
2n 1 +2n 2 +2n 3 i< 30. 
2n 1 +2n 2 +2n 3 +2n 4 < 45. 



(1) 
(2) 
(3) 



2n t +2n 2 +2n 3 +2n 4 +2n s < 60. (4) 

Suppose at a call arrival epoch, or a call completion epoch, or 
a call handoff epoch, the new state is (1, 3, 4), which is a 
feasible state from equation (6). From equation (1) it is easily 

seen that a . ( n . )=1. Based on this result and form equation 

(2) and (3), it is ready to drive that jj . (n . ) =2. Finally from 

* 
equation (4), we can figure out that jjj (n . ) =1. 

Therefore based on our channel sharing algorithm, we obtain 
the following channel allocation for state (2, 3, 4...): 

Assign 4 channels to each of the 2 class 1 call, 
Assign 4 channels to each of the 3 class 2 calls, 
Assign 3 channels to each of the other 2 class 2 calls, 
Assign 3 channels to each of the 4 class 3 calls. 
Similarly for nth time. 

MULTIBEAM ACCESS POINT 
Antenna System for CAC: two types of Multibeam 
smart systems. One is based on adaptive arrays and the 
other is based on the fixed beam directional antennas, in 
present study, we consider fixed multi beam antenna 
system. 

Let antenna system consist of M sectors, each of 
which is oriented to provide non overlapping 360/M 
azimuth coverage. Each sector consists of N narrow 
beams with approximately 360/MN beam width per beam 
where the bandwidth of two edge beams of each section 
may be a little bit larger for better coverage. In a 
Multimedia Mobile Network (MMMN) system, CAC is 
used to accept or reject connection request based on the 
state information which defined as 360/MN. And we 
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Abstract — Business data is a valuable asset for many 
Organizations. Organizations need security mechanisms that 
provide confidentiality for outsourcing their data services. 
Encrypting sensitive data is the normal approach in such a 
situation. Applications typically use Symmetric keys for 
encryption, or Asymmetric keys for their transmissions. In case 
of Asymmetric encryptions they use the public keys of the signers 
along with files sent. Since these identity strings are likely to be 
much shorter than generated public keys, the identity based key 
generation is an appealing option. A multi-signature scheme 
enables a group of signers to produce a compact, joint signature 
on a common document, and has many potential uses. Existing 
schemes with multi signers impose requirements that make them 
impractical, such as requiring a dedicated, distributed key 
generation protocol amongst potential users. These requirements 
limit the use of the schemes. Multi-Party or co-operative 
authentication on information is a trusted source of security. In 
this paper, we propose an encryption scheme where each 
authorized user's information is used to encrypt and decrypt 
data. This paper, presents a multi-party yet supportive, secure 
and identity-based scheme based on symmetric encryption, 
Multi-party Supportive Symmetric Encryption (MSSE). This 
paper takes an effort to resolve the security issues and also report 
on the results of the implementation 

Keywords: Symmetric Encryption, Sub-key, Key Management, 
Key generation, Multi-party 



shared key never given to the parties, but be a part of the 
functionality. 

II. MULTIPLE ENCRYPTION 

Multiple encryption is the process of encrypting an already 
encrypted message one or more times, either using the same or 
a different algorithm. Multiple encryption algorithms allow 
users to pick their own logic and the benefit of this approach is 
that if an algorithm turns out to be seriously broken, supporting 
multiple algorithms can make it easier for users to switch. 
Multiple algorithms add more complexity to the application. 

III. MULTI-SIGNATURE SCHEMES 

Multi-signature schemes [2] allows different signers with 
public keys to collectively sign a message, yielding a multi- 
signature. Multi-signature schemes greatly save on 
communication costs. In most applications these public keys 
will have to be transmitted along with the multi-signature. The 
public keys of all cosigners are needed to verify the validity of 
such a multi-signature schemes. The inclusion of information 
that uniquely identifies the cosigners seems inevitable for 
verification For example, the signers' user names or IP 
addresses could suffice for this purpose; this information may 
even already be present in package headers: 



I. 



Introduction 



Information channels are generally vulnerable to 
eavesdropping and attacks from outsiders. Strong cryptography 
is needed to protect these channels. Traditional access controls 
that provided confidentiality were designed in-house and 
depended on authorization policies. According to Forrester 
Research, enterprise storage needs grow at 52 percent per year 
[1] and organizations chose to outsource their data storage to 
third parties. One of the biggest challenges raised by data 
storage outsourcing was security and trust. Cryptographic 
approach also provided data confidentiality. Encryption is a 
method to securely share data over an insecure network or 
storage site. Users who communicated needed to establish a 
mutually held secret key k. In public key cryptography two 
parties communicated with a public and private key. The 
functionality allowed the parties to establish a shared 
symmetric key and to encrypt and decrypt messages in an ideal 
way using this key. The key was meant to be a long-term 



IV. Identity based signatures 

In an identity-based signature scheme [3], the public key of 
a user is simply his identity, e.g. his name, email or IP address. 
A trusted key distribution center provides each signer with the 
secret signing key corresponding to his identity. When all 
signers have their secret keys issued by the same key 
distribution center, individual public keys become obsolete, 
removing the need for explicit certification and all associated 
costs. These features make the identity-based paradigm 
particularly appealing for use in conjunction with multi- 
signatures, leading to the concept of identity-based multi- 
signature (IBMS) schemes. Application implementations of 
IBMS schemes are rather limited. While pairings have turned 
out extremely useful in the design of cryptographic protocols, 
they were only recently brought to the attention of 
cryptographers [4], and hence did not yet enjoy the same 
exposure to cryptanalytic attacks by experts as other, older 
problems from number theory such as discrete logarithms, 
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factoring and RSA. Our scheme is essentially a multi-party co- 
operative Symmetric scheme with identity of the participating 
parties. The techniques are strengthened to provide security 
against concurrent. 

V. RELATED WORK 

Diffie and Hellman [5] have argued that the 56-bit key used 
in the Federal Data Encryption Standard (DES) [6] is too small 
and that current technology allows an exhaustive search of the 
256 keys. Double encryption has been suggested to strengthen 
the Federal Data Encryption Standard (DES). A recent proposal 
suggests that using two 56-bit keys but enciphering 3 times 
(encrypt with a first key, decrypt with a second key, then 
encrypt with the first key again) increases security over simple 
double encryption. At the 1978 National Computer Conference, 
Tuchman [7] proposed a triple encryption method which uses 
only two keys, Kl and K2. The plaintext is encrypted with Kl, 
decrypted with K2, then again encrypted with K 1. Schemes 
that encrypt data on the client-side, enable server-side searches 
on encrypted data. [8] Introduced the first practical scheme for 
searching on encrypted data. The scheme enables clients to 
perform searches on encrypted text without disclosing any 
information about the plaintext to untrusted servers. The 
untrusted server cannot learn the plaintext from the encrypted 
search results. The basic idea is to generate a keyed hash for 
the keywords and store this information inside the ciphertext. 
The trusted server can search the keywords by recalculating 
and matching the hash value. [9] proposed a scheme to execute 
SQL queries over encrypted numeric data and is suitable for 
exact matches and also range queries. Its strategy is to store the 
encrypted numbers with some index information and to split 
the query into a query on the encrypted data processed by the 
untrusted server and a query on the returned result for post- 
processing results on the client. [10] presented a scheme for 
searches on encrypted data using a public key system that 
allows mail gateways to handle email based on whether certain 
keywords exist in the encrypted message. The application 
scenario is similar to [8], but the scheme uses identity-based 
encryption instead of symmetric ciphers. Using asymmetric 
keys allows multiple users to encrypt data using the public key, 
but only the user who has the secret key can search and decrypt 
the data. [11, 12] enable searches on encrypted data by 
constructing secure indexes. All the schemes above rely on 
secret keys however, which implies single user access or 
sharing keys among a group of users 

VI. Multi-party Supportive Symmetric Encryption 
(MSSE ) 

The basic characteristic of MSSE is sharing of information 
between users in the generation of the key. Each user has his 
own information designed as a part of the key. This section 
introduces the basic construction of the multi-party supportive 
symmetric encryption scheme built upon symmetric 
encryptions. The notions of security are also discussed and 
proofs provided in later sections. MSSE Scheme has its own 
unique features. The Key features being Variable key length, 
Key dependent rotation, □ Lengthy key schedule algorithm 



and Multiple Linear Functions with □ Variable of number of 
rounds. 



Plain Text 



Symmetric Key 




Fig. 2. MSSE Architecture 

VII. KEY GENERATION 

The key will be generated with both the sender, receiver and 
servers name included. Since the key comprises of various 
components and is a combination of server and client related 
information, it makes it hard for the attacker to guess the key. 
The step by step procedure is as follows: 

A A KEY GENERATION ALGORITHM 

Sender and Receiver agree on two numbers "p" and "g" , 
where p is a large prime number and g the base generator. 
Sender then chooses his secret odd number called "a". 
Similarly the Receiver's secret odd number is "b". Sender and 
Receiver exchange their numbers. The senders email id is 
known to the receiver and the receiver knows the senders 
email id. Sender knows p, g, a, b, receivers emaillD and the 
Receiver knows p, g, b, a, senders emaillD. 

B FUNCTION MAIN KEY 

INPUT: p,g,a,b and Senders Email Id, Receivers Email ID 

OUTPUT: 512 bit Secret Key 

The First part of the key ki is the senders email id converted 

into its ASCII value in 192 bits or 49 bytes. The sender 

Computes the Key for Encryption as k 2 = g b mod p. The 

Third part of the key k 3 is the receivers email id converted 

into its ASCII value in 192 bits or 49 bytes. The final and 
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fourth part of the encryption key is computed as k 4 = g a mod 
p. The Secret key is generated as Key K = ki 1 1 k2 1 1 k 3 1 1 k 4 . as 

demonstrated in Fig. 1 . 



Email Id of the 
Sender in 192 
bits (49 Bytes) 


64 bit key 
of Receiver 


Email Id of the 
Receiver in 192 
bits (49 Bytes) 


64 bit key 
of Sender 



Fig. 1. The 512 bit Encryption key 



For example, 



p=l 1 and g = 10 and a=5 and b=8 Then 
K 2 = 10 5 mod 1 1 would be 10 and K 4 = 
10 



10 mod 11 would be 



If the email id of the sender vnkumar62@yahoo.com , this 
would be translated into the following sequence 118 110 107 
117 97 114 54 50 64 121 97 104 111 11146 99 111 109 

If the email id of the receiver is ssdarvind@yahoo.com , this 

would be translated into the following sequence 115 115 100 

97 114 118 105 115 100 64 121 97 104 111 111 46 99 111 

109 

TheKeyK = k 1 ||k 2 ||k3l|k4 

00001010 01110110 01111000 01110101 01110101 

01100001 01110110 00110110 00110010 01000000 

01111001 01100001 01101000 01101111 01101111 

00101010 01100011 01101111 01101101 011110111 

011110111 01100100 00110110 01110110 01110110 

01101001 01111000 01100100 01101000 01101111 

01101111 00101010 01100011 01101111 01101101 
00001010. 

Here a 432 bit key is generated. It will be split into 216 Two 
bit keys. It will have a minimum of 40 rounds of sub-keys for 
one round of the Secret key. Approximately 256 x 216 i.e 50k 
bytes of Plain text will be converted to Cipher text with one 
round of the key. 

C KEY SCHEDULING (DIVIDE-KEY FUNCTION) 

This function is called Divide-key function because it creates 
Two bit keys from the secret key. The function knows the 
length of the secret key in advance and then correspondingly 
splits the secret key into equal 2 bit sub-keys as explained in 
equation (1) : 



K(l,2,3,4. . ..0 ~K(l to-2y K 3to48 , K\_ 2 to l),- 



•(i) 



where 1,2,4 1 are the no of sub keys and 1 is the variable 

length of the key based on the senders and receivers email id's 
and agreed numbers p,g, a,b. 



D MSSE ENCRYPTION ALGORITHM 

Step 1: Generate 512 bit Secret key using Main_Key function 

Step 2: split the Secret key into 2 bit Sub-keys with Divide- 
key Function 

Step 3 : counters ky=0,j=0,kcnt=keylength in bits 12 
For i=0 to msglength do step 512 

j=j+l 

C[i] = M[i] SHL //SHLOnce 
C[i] = M[i] SHL // SHL Second Time 
C[i] = M[i] XOR kj // XOR of two bit sub key 
padded with zeros to get 8 bits is done 
If j > kcnt then 

J=0 
End if 
Next i 
Step 4 Display C 

INPUT: M=(ml . . . .m 512 ) plain text and K =(kl . . . .k 256 ) 256 bit 
Secret key split as 2 bit key 
OUTPUT: C=512 byte cipher text 

E MSSE DECRYPTION ALGORITHM 

Step 1: Generate 512 bit Secret key using Main_Key function 

Step 2: split the Secret key into 2 bit Sub-keys with Divide- 
key Function 

Step 3 : counters ky=0,j=0,kcnt=keylength in bits 12 
For i=0 to msglength do step 512 

j=j+l 

C[i] = M[i] XOR kj // XOR of two bit sub key 
padded with zeros to get 8 bits is done 

C[i] = M[i] SHR //SHROnce 
C[i] = M[i] SHR // SHR Second Time 
If j > kcnt then 

J=0 
End if 
Next i 
Step 4 Display M 

INPUT: C=(cl....c 5 i 2 ) cipher text and K =(kl....k 256 ) 256 bit 
Secret key split as 2 bit key 
OUTPUT: M=512 byte plain text. 
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VIII SECURITY ANALYSIS 

An attacker (or a software agent) that gains privileged access 
to the data storage or a untrustworthy employee, can intercept 
the communications between clients and the server. The 
attacker is restricted to passive attacks, i.e. attacks are based 
upon observed data. In most cases the attacker is isolated from 
the users and initialized by the client. The goal of the attacker 
is to gather direct or indirect information about the stored data. 
The following points ensure the unpredictability of the results 
for the attacker 

• The algorithm involves Rotating the bits, XORs, 
Complements and Rotating Lefts, ensuring no test 
blocks of cipher text are the same.. 

• Due to keys change for each block, it is very hard to 
perform the cryptanalysis on the keys. 

• Due to 512-bit key and 2-bit Sub-Key, the cipher 
becomes more secure. Because, a total 2 256 + 2 n 
number of permutations are possible where 256 >= 
n>=2. So, brute force attack is much time taking, 
nearly 1.079x1028 year for a personal computer 
which permutes thousands of 128-bit numbers in 1 
second for n=7. If we increase the value of n then the 
number of years required for brute force attack will 
increase. The lesser the size of n, the number of key 
generation is more. Hence, in both the cases, we are 
optimizing security. 

• Since the Sub-key changes for every block, secure 
key exchange becomes unnecessary, reducing the 
network traffic. 

• If an attacker is so lucky and he does the best guess, 
the probability for guessing the key will be (1/2 128 ) or 
2.938*10-39, for Number of bits it will be (1/ 2 7 ) or 
7.812xl0" 3 when n=7 and the joint probability for 
both will be (1/2 128 )*(1/ 2 7 ) or 2.295*10-41, achieving 
message confidentiality. 



IX CONCLUSION AND FUTURE SCOPE 

In this paper, we presented a new data encryption scheme that 
does not require a trusted data server. Unlike previous 
searchable data encryption schemes that require a shared key 
for multi-user access, each user in our system has a unique set 
of keys. The data encrypted by one user can be correctly 
decrypted by all the authorized users in the system. Moreover 
the keys can be easily revoked without any overhead, i.e. 
without having to re-encrypt the stored data. 
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Abstract — This Paper offers an efficient channel aware 
scheduling scheme for IEEE 802. 16e WiMAX Mobile, real-time 
and non-real-time polling service. Compared to a similar 
scheduling approach, our considered scheduler can guarantee 
and achieve lower delay with a good average throughput. In 
order to achieve this object, we introduce a scheduling scheme 
with four different segments in a decision making process. The 
first part, a time dependent function that considers the time when 
packets wait in queues and in a jitter area to prevent packet 
deadline. Buffer utility function, as the second part, considers 
buffer size in scheduling to prevent overflow, specifically in nrtPS 
class with large size packets. The third part, retrieved from 
proportional fairness algorithms, which in normal conditions 
gives a fair share to users. Channel SNR and service class weight 
are also involved in this part. The final section of scheduling 
relationship, channel condition, is defined more accurately by 
RSSI and CINR parameters. The simulation results in OPNET 
show that our proposed scheme has a very good delay and packet 
loss ratio accompanied by a high throughput. In another 
scenario, with different number of users and limit resources, we 
show relationship between admission control and scheduling. 



Keywords-component; IEEE 802.1 6e; 
QoS; Resource Allocation; OPNET 



WiMAX; Scheduling; 



I. Introduction 

In recent years, bandwidth hungry applications, such as 
video and music streaming, large file downloads, etc have 
been significantly used. Wireless and vehicular accession to 
such contents lead companies and standard organizations like, 
3GPP and IEEE, to develop BWA technology. IEEE 802.16 
standard families with long distance and QoS mechanism 
support are among the important and active technologies for 
these Issues also, counted as a strong 4G candidate under the 
development of 802.16m standard version. In our survey, 
IEEE 802.16e mobile-WiMAX standard [1] has been studied 
for its special features like power management and handover 
capability rather than for its fixed version. Two PHY and 
MAC layers are defined by standard, which in Medium 
Access Control layer are responsible for QoS mechanism such 
as call admission control and scheduling. For resource 
allocation, channel aware scheduling [2] are cross layer 
processes which use some physical layer parameters like SNR, 
CINR and RSSI for decision making procedure. Unlike 
channel unaware scheduling which assumes error free 



transmission media, in wireless system, for its extreme time 
varying nature, we need to consider channel condition to 
prevent waste of resource. 

Mobile-WiMAX uses TDD mode that makes channel 
estimation easier, also operates in 2-llGHz Frequency Range. 
Both access technologies, OFDM and OFDMA can be used in 
WiMAX. Our used technology is OFDMA that increases 
Bandwidth utilization but makes scheduling problem more 
difficult. Scheduler in OFDM decides for OFDM symbol and 
all subcarriers are allocated to one user, nevertheless each 
subcarrier can select different Modulation and coding scheme 
which makes it difficult to estimate average rate in a forward 
frame. Decision making for OFDMA is much more difficult 
than for a time scheduler, because it must select subchannel in 
a frequency domain. 

The smallest allocation unit by Scheduler is called slot. In 
OFDMA, slot is a combination of some subcarriers and some 
OFDM symbols. Depending on permutation process, slot may 
have different definitions. For instance, for uplink in PUSC, 
each slot consists of 3 OFDM symbols and 16 subcarriers. The 
other issues are burst in a downlink that must be rectangular 
Fig. 1. which makes it difficult for a downlink scheduler to 
select the slot in such a manner that best fits to the user 
allotment [3], [4], [5]. Packing and Fragmentation are other 
options that can be used by WiMAX equipment to fit MAC 
SDU in Mac PDU. 

OFDM Symbol's r 
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Fig. 1. OFDMA Frame in 802. 16e 
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Fig. 2. Admission control and scheduling for user 

Both of the two procedures utilize transmission frame but 
increase overhead and must be considered in scheduling. The 
most important parts in QoS are scheduling and call admission 
control Fig. 2. which open spaces in standard and have been 
done by MAC layer. Medium access control layer consists of 
3 sublayers; a) convergence sublayer, b) common part 
sublayer and c) security sublayer. In convergence sublayer, 
packet classification and getting proper CID and SFID have 
been done. According to the requirements and the specific 
parameters, packets are classified into 5 QoS service classes. 
1) UGS: this class gets fix bandwidth without any overhead 
and can guarantee QoS, but wastes resource when traffic 
changes. 2) ertPS: this service is suitable for VoIP traffic with 
silent suppression and needs a polling mechanism to inform 
end of the silence, similar to UGS, QoS parameters are 
maximum latency tolerance, maximum sustained rate and 
tolerated jitter. 3, 4) rtPS, nrtPS: These two service classes are 
for real-time and non-real-time variable rate traffic. For 
varying nature of packet size like video streaming in rtPS and 
FTP download in nrtPS. Polling mechanism is needed to 
specify what amount of resource must be granted to the users. 
For nrtPS, there exists no delay guarantee but minimum 
throughput is guaranteed. 5) BE: most of the traffic is 
classified to this QoS class of services. After all other classes 
being allocated, there will be no QoS guarantee and queue to 
use the remaining resources when other class was allocated. 

In point to point configuration, WiMAX uses a centralized 
scheduling Fig. 3. It means, Base station makes decision for 
uplink and downlink traffic. Even the grant for a user is based 
on GPSS (grant per subscriber station), we'll need another 
scheduler in mobile station. Another grant method, GPC 
(grant per connection) was outdated in IEEE 802. 16e. Before a 
packet be classified and a scheduler make a decision, the call 
admission control unit accepts or rejects the new connection, 
according to the estimation of a system capacity. Clearly, 
inappropriate capacity estimation by CAC unit degrades 
scheduling performance, especially when it accepts more than 



Fig. 3. Centralized scheduling in WiMAX 
system real capability. 

II. Polling And Related Works 

A. Polling Service 

Bandwidth request in WiMAX is categorized as either 
implicit or explicit method. In WiMAX mobile, there exists as 
a whole, 11 different ways for bandwidth request. Unsolicited 
request, bandwidth stealing, poll-me bit, piggybacking, 
codeword over CQICH, CDMA code-based and contention 
region based are as implicit method. Polling based methods 
like unicast polling, multicast polling, broadcast polling and 
group polling are categorized as explicit method. Guarantee 
QoS, required information about the user queues such as 
buffer size and head of line packet states. There is a need for 
delay and throughput guarantee in an uplink for a service 
class. Its suitable and possible bandwidth request way is 
polling based method. In this way, station polls users for 
requesting slot in a periodic interval to transmit their packets. 

In polling mechanism, at first, user bandwidth must be 
admitted by admission control unit. According to QoS 
parameters defined for queues base station, poll users in a 
periodic interval to request for a bandwidth. These polling 
intervals may be addressed to individual SSs (unicast polling) 
or to groups of SSs (broadcast or multicast polling). Polling- 
based service scheduler uplink traffic makes decision for 
queues and then grants bandwidth, in accord to the available 
resources and the number of users. Users by decoding UL- 
MAP in uplink can be informed about their grants. 

Choosing suitable approach and polling mechanism delay are 
the two problems for this method [6]. Unicast polling prevents 
the request collisions and can guarantee the delay, but by an 
increase in the number of stations, tremendous bandwidth for 
polling are required which decrease bandwidth for the grant. 
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Where ± r t (t) is an achievable data rate of user i and Ri(t) is an 
average data rate in a given time window Tj(t). ^(t) is 
computed by an exponential averaging in past Ti windows 
expressed in Eq.2 : 



Fig. 4. Polling based bandwidth request mechanism 

Multicast or broadcast polling mechanism, according to 
contention-based area, is a better approach for a large number 
of users, but decreases throughput of the system. 

According to Fig. 4. polling mechanism in WiMAX is a 3 
way handshaking process which increases delays in the 
queues. A user waiting for poll intervals also must wait for 
the response of scheduler for grants. In the best situation, after 
2 frames, request for queues, can access to the channel. 

B. Common Channel Aware QoS Scheduling Algorithms 
Fairness, delay, throughput, Energy Consumption, Power 
Control, Complexity and Scalability are the important 
parameters in scheduler design, a metric evaluation and 
comparison. Scheduling in WiMAX can be classified in two 
main parts, channel aware and channel unaware methods. In 
which these method can be used for intra-class and inter-class 
scheduling. Most of the channel unaware scheduling comes 
from router and CPU fundamentals that extended for WiMAX 
U], [8], [9], [10]. This series of algorithms assume an error 
free idle channel for each user, and share resources according 
to their capacity and QoS parameters. In wireless transmission 
channel, for each user, conditions differ and degrade in 
frequency and time domain. So channel state must be 
considered in resource allocation decision making process. 
WiMAX uses CQI Channel to inform base station about 
channel conditions. CQICH information primarily is used by 
adaptive modulation and coding module to select the best 
scheme in transmission. Channel aware scheme also can use 
these channel parameters in RSSI and CINR decision 
makings. For the algorithms that use channel state, four main 
categories can be named as: Proportional Fairness based, QoS 
guarantee based, power constraint and System throughput 
maximization. 

The goal, in PF-based scheme [11], is to achieve the long-term 
fairness between the queues, especially in BE service class 
which offers no guarantee for quality of service. In PF-based 
scheme each user who can maximize Eq.l gets an opportunity 
for transmission. 



cm = 



r t (f) 
R t (t) 



Ri(t) 



= j(l-l) Ri (t-l)+^ q i( t)*0 

(RiCt-'l) ' qi(t) = 



(2) 



Tj(t) has impact on throughput, but its accurate selection is 
difficult. Proportional Fairness algorithms do not guarantee 
delay or throughput, also short time fairness are not satisfied. 
It cannot be a proper method for a delay sensitive traffic and 
for an application minimum throughput requirement, and it 
needs to be modified differently. 

QoS guarantee algorithms provide delay and throughput 
requirements for each service class that needs QoS. M-LWDF 
families [12] are the most important algorithms in this 
category that try to modify LWDF, in which throughput, are 
optimal. As one of these approaches in [13] queue i that 
maximizes Eq.3 in subchannel k, can get permission to 
transfer its packet. 



Channel_gain(i, k) x HOL_packet_delay(i) x 



d{i) 



(3) 



In this equation a(i) is throughput in coming frames and 
d(i) is average throughput in past specific time window. 
Channel gain is the normalized ratio of the square of noise at 
the receiver and the variance of Additive White Gaussian 
Noise. HOL_delay is a waiting time in buffer for packet in 
head of the queues. This algorithm using some buffer state 
information in decision making, are useful for QoS guarantee. 



Utility 



Jitter 




(1) 



Deadline Time 

Fig. 5. Time Utility Function 
Another QoS-based approach is UEPS scheme [14] that uses 
time utility function to make an urgency when, packets in 
queues, enter jitter area or next to deadline. U t (t) is time 
utility function for delay. According to Fig. 5. when packets 
enter to a jitter area, first derivative of U t (t) increases and 
Eq.4 gets more weight and consequently, high probability to 
access the resources. 
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(4) 



Where, Rt(t) is similar to r t (i) and a(i), is a channel capacity 
for queue i. R[{t) is equally calculated by Eq.2 and U[(t), 
time utility function, is the absolute value of the first 
derivative of U t (t). UEPS do not consider buffer size that may 
overflow when the packet size is big. 

In throughput maximization algorithms [14], user with better 
channel state, in a heuristic approach, gets more resources 
without QoS guarantee and fairness. Two important factors in 
channel quality are RSSI and CINR, where CINR has more 
weight. One of these approaches is MAX CINR, in which 
user with the highest CINR in its channel, gets more 
permission to transfer in channel. MAX CINR cannot 
guarantee QoS and degrade fairness. To choose user with the 
best channel condition, maximize throughput. 

Another channel aware algorithms category is power 
constraint based approach which considers sleep and Idle 
mode and also battery limited [15]. Using power management 
method increases delay, because mobile station goes to sleep 
for a specific interval till base station wants to be aware or 
unaware periodically. Packets in buffer remain until mobile 
station is aware. 

III. Soft Tracking Variation of the Parameters of QoS 
Scheduling decision consists of 4 main parts the goal of 
defining each algorithm is to meet QoS by tracking important 
parameters more accurately. For polling services rtPS and 
nrtPS we do not need two schedulers for intra-class and for 
inter-class scheduling. For simplicity the weight of two polling 
classes is inserted into main equation. In spite of the 
simplification of scheduling to remove 2 steps scheduling into 
1 step, if parameters are not assigned properly, algorithms 
create cross point that degrade performance of scheduling. 

A Average Rate Updating 

At First part of proposed scheduling scheme, we use 
Proportional fairness relationship which provides long-term 
fairness among users. In normal conditions when no packet is 
near to deadline or buffer is not shortly to overflow, it is better 
to keep fairness among users in achieving resources. We make 

some changes in PF formula, -*— - to adapt them to rtPS and 

nrtPS service classes. According to user channel and his SNR 
in calculated sub-channel, Rt(t) is achievable data rate in 
coming frame. After that adaptive and modulation coding 
decides what types of MCs are suitable for user subcarriers, 
scheduler add all of them to find out user data rate in coming 
frame. U t (t) is calculated by Eq.5. 



Table I 
Si selection for different polling service 



Class of service 


Si 


rtPS 


Number of frame needed to transfer packets 

x A 


nrtPS 


Number of frame needed to transfer packets 

xB 



f/iCO 



U(t-1) Qi(t) = 



(5) 



Si is time window that measures average user data rate. In 
proposed scheme Si, relationships have been exerted according 
to the number of required frames to transfer user packets, and 
also weight of classes is inserted in this part. Si selection is 
done according to Table I. 

Where A and B are rtPS and nrtPS weight in resource 
allocation. Prior to this, weight of class is used to make a 
decision in inter-class scheduling. When number of frames, 
needed to transfer user packet, increases, users have fewer 
chance to get resources. This is what we need for rtPS with a 
small packet but it must be transferred as soon as possible. 

B. Deadline 

rtPS is a real-time service class where, more often, traffics 
sensitive to delay are classified in this category. Packet size 
for rtPS is not big and if it does not transmit in a specific time 
interval, deadline, system throws the packet away. Deadline is 
counted relative to the waiting time in the buffer, and if it rises 
from determined threshold, related to its application, it'll be 
thrown away by system. We implement an emergency when 
packets are close to deadline to prevent loss of packet. T t (t) is 
time utility function in Fig. 6. which has 2 different graphs, 
Tl and T2. In our proposed scheme we use the first 
derivatives of T t (t). 

♦ T,(t) 

Jitter - Jitter 

< +< ► 

1 




4 -M 



Deadline 



Waiting 

■►Time 
t 



Fig. 6. Time Utility Function in Proposed scheme 

In normal condition when packet is not near its deadline, 
first derivative of T t (t) is very small and there is no need for 
urgent transmission. When packet stays for a long time in 
buffer and closer to its deadline, emergency occurs. For two 
polling services, we select time utility function by Table II. 
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Table II 
Ti for different polling service 



Class of service 


Ti 


rtPS 


T2 


nrtPS 


TI 
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Table III 
B Y for different polling service 



Class of service 


Bi 


rtPS 


Bl 


nrtPS 


B2 



Packet waiting time previous to enter to jitter area, first 
derivative of rtPS increases, and is shown in T2. Jitter does not 
define QoS parameters for nrtPS service class but in this part 
of scheme, according to application, we defined threshold and 
jitter area for non real-time traffic, when get permission to 
transmit while waiting too much time in queues. According to 
TI, in nrtPS, when packet enters the predefined jitter area, the 
incline of TI increases more rapidly and causes more weights 
in scheduling decision. In this way we can guarantee delay for 
real-time traffic. Important point is that deadline in nrtPS is 
very large about 3 seconds, for example, but in rtPS is about 
milliseconds. 

C. Buffer overflow 

Buffer size and number of packets in a buffer are the 
important parameters which must be considered in making 
decision for bandwidth allocation, especially when the size of 
packet is big. When packets do not get permission to transmit 
and with a finite buffer size, overflow can occur. Delay for a 
thrown away packet, because of overflow, considered as 
deadline time and this time is added to average delay and 
increases it. Most of the time, buffer overflow degrades 
system performance and causes very bad damage. 
In this scheme we introduce a buffer utility function alike a 
time utility function to manage buffers and involve their states 
in scheduling decision. B { (t), with two graphs Bl and B2, is a 
function which depends to buffer size and is illustrated in Fig. 
7. 



A(t) 



85% of Buffer Size 




Buffer Size 
(in Byte) 



Max Buffer Size 



Fig. 7. Buffer State Utility Function 

In nrtPS service class, size of the packet is big and it is not 
so much sensitive to delay, but in rtPS, size of the packet is 
not big and the possibility of overflow in finite buffer is low. 
According to Table III, we select ^(t) and use the absolute 
value of the first derivative of B { (t) in our proposed scheme 
to consider different polling classes and their impacts. 



Overflow for nrtPS with large packet size is more possible 
than for rtPS. For this reason, we choose B2 for nrtPS in 
which the impact of incline and consequently weight of 
buffer is softer and increases before being close to overflow. 
According to our experiments for rtPS, we don't need to 
consider buffer size because of packet size, until 85 percent 
of overall buffer size is filled. When size of the buffer 
reaches to 85 percent relative to overall size, something like 
emergency has occurred and slope of Bl increases which 
causes to more weight in scheduling decision. 

D. CINRandRSSI 

In average rate updating formula and R[ (t) , to consider 
channel state we only use SNR, which is in WiMAX, power of 
noise are calculated by Eq. 6. 

-174 + 10 x log (BW x n x ^fft ) + Noise Fi 9 ure ( Am P li f ier ) ( 6 ) 

-174 is normal thermal noise in dBm/Hz for base station 
environment, BW is overall system bandwidth and n is the 
number of OFDM symbols. In SNR we cannot accurately 
determine channel state. CINR and RSSI are two important 
parameters in channel quality which are considered to make a 
better channel estimation for scheduling decision. CINR is 
already and usually used in throughput maximizing scheduling 
algorithms, and RSSI almost used in power constraint scheme 
to consider power in each subcarrier. 

We introduced M(RSSI, CINR)defined by Eq.7 for a better 
consideration of channel state in making a decision. 

M(RSSI, CINR) = |log (RSSKjnW))^ 1 x CINR 2 (mW) (7) 

CINR are more important parameter than RSSI. For example, 
a condition those two Base Stations are close to each other, in 
this situation when user is near the base station, RSSI is high 
but CINR is low which means Signal strength is high but 
channel condition, is not really good because of interference. 
High CINR shows better channel with low packet loss 
probability. According to the ranges of CINR and RSSI we 
offer Eq.7 where CINR has more weight in equation. 

E. Final relationship 

Final relationship in our proposed scheme for scheduling 
decision making is represented by Eq.8 for real-time and non- 
real-time class of service. 



|7V(t)| x \B[(t)\ x ^ x M (RSSI, CINR) 



(8) 



And according to the previous section, |r/(t)| is the 
absolute value of the first derivative of time utility function, 
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At each scheduling instance { 
for 7=7 to N /update 



\dTj{t)\ 
dx 



\dB 

I dx 



A , RSSI(j) , CINRQ) , and 



if (Q/t)±0){U/t)=j*R/t) + (l-j)*U/t-l)} 

if (Q/t) = 0) fU/t) = U/t-1)) } 

} 

QoS 'schedule = 

for j=l to N { if (Q/t)) > 0) { QoS_schedule = 1 } } 

if (QoSschedule > 0) { IS= arg maxj 

(\^\i^\* M ( RSSI (J)>ciNRG))*(^)) > 

<Variables> 

Tj(t): Time utility Function for MS/s 

Bji Buffer Utility Function for MS/s 

RSSI(j): Received signal strength Indicator of MS/s (mw) 

CINR(j): Carrier to Interference and Noise Ratio of MS/s 

(mw) 

TV : Number of MS's having QoS class connection 

Rj(t): Data Rate for MS/s in coming farme according to 

bits/sec 

IS : Index of the selected MS 

Uj(t) : Average Data Rate in S { Farme before 

Q/t): Buffer state for MS/s according to number of 

packets 
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achieveing bandwidth. There are two possible ways for 
resource limitation, one way is to increase number of users, 
then output analysis would be much more difficult, and the 
other one is limitation by Admission control unit. In our 
simulation the latter is used, by Admission control we create 
competition for bandwidth . 



Fig. 8. Algorithms for proposed scheme 

|£-(t)| is absolute value of the buffer size in dependant 
function, Rt(t) is user's data rate in coming frame, U t (t) is 
user's average data rate in Si time window of past, and finally 
M(RSSI, CINR) is channel state function for better measuring 
channel condition. This scheme with its used parameters is 
showed in Fig. 8. 



IV. Performance Evaluation 

A. Simulation Environment and Parameters 
For performance evaluation we developed wimax layers in 
OPNET simulator [16]. Scheduling for polling based service 
and BE service worked in wimax_bs_control process model. 
At first we compared our proposed scheme with UEPS, PF, 
OFDM frame-based PF [17], M-LWDF and MAX CINR in a 
simple scenario and then, at second case, we studied 
admission control impact on our scheduling scheme. For our 
simulation, we used TDD mode with 20MHz OFDMA access 
technology and PUSC permutation in uplink and downlink. 
Frame duration are 5 msec with 48 OFDM symbols in each 
farme, 12 for uplink and 36 OFDM symbols for downlink. 
TTG is 106 usee and RTG is 60 usee and numbers of data 
subcarriers in uplink are 1440 and in downlink are 1120 of 
2048 subcarriers. Transmitted power for Base station and user 
station is equal to 0.5 watt. We can evaluate scheduling 
scheme when there are few resources, and users compete for 



B. Schedulers for Multiple Traffic Classes 

At first we design a simple scenario to compare our scheme 
with some main QoS guarantee and Fairness channel aware 
algorithms. 




Fig. 9. First simulation scheme in OPNET 

In Fig. 9. we define 3 user station and one base station that 
is directly connected to the server. Stations have 3 different 
classes of service, ertPS, rtPS and nrtPS where resources are 
limited by admission control. For ertPS node we define voice 
application whose frame size is 57 bytes and with 0.25 
seconds inter-arrival time. In second node, we implement rtPS 
with video application in which frame size have chi-square 
distribution with a 32kbytes mean and 12frame/sec. At FTP, 
the third node, download traffic application is classified into 
nrtPS service class. For third node, we define traffic by a 
variable rate in which packet size has a exponential 
distribution with 86kbytes mean and Poisson inter-arrival 
time with 500 milliseconds mean. All traffics transfer in 
TCP/IP and the header of these protocols must be measured in 
calculations. Channel model for pathloass is based on Erceg 
model and for multipath modeling we use ITU pedestrian 
model. Terrain is mostly flat with light tree densities. At first 
we evaluate WiMAX average delay for and in our proposed 
scheme we consider 2 parameters first with variable Si 
according to the above mentioned equation and second with a 
fixed Si equal to 1000. Fig. 10. shows that average delay for 
our scheme is lower than the other schemes even with a fixed 
time window. By holding time window Si fixed or variable 
average delay does not change, but with a variable time 
window, at the beginning, causes lower delay. Fig. 11. 
illustrated average throughput of system which consider with 
and without M(RSSI,CINR) . Better measured channel 
condition causes more average throughput, because error in bit 
rate decreases in transmission occurrence. 
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Fig. 10. Average delay in WiMAX system 
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Fig. 12. Packet loss ratio in WiMAX 
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Fig. 11. Average throughput in Mbps 

Obviously MAX-CINR have better throughput than 
proposed scheduling, because user with a better channel 
condition can get a high priority without any QoS guarantee. 
Variable time window in average data rate calculation causes 
better throughput even if we do not consider M(RSSI, CINR). 
For nrtPS, buffer management has an important role to prevent 
overflow and increase throughput especially when resources 
for scheduling are limited and only high priorities can get 
permission. 

Packet loss ratio in our scheme is approximately equal to 
MAX-CINR as considered in Fig. 12. In Fig. 11 throughput is 
lower than MAX-CINR which shows more error bit rate. For 
using different coding scheme as CRC and convolution coding 
adaptively with channel condition some missed bits 
recovered and consequently packet loss ratio has no 
expansion. MAX-CINR for selecting users with appropriate 
channel states, error correction schemes are not required any 
more, but in our scheduling, error correction schemes are 
helpful and hinders loss of packets as in MAX-CINR. 



A Impact of Admission Control in scheduling 
In this scenario we investigate Admission control unit 
impact on our scheduling scheme. Channel condition, OFDM 
symbol duration, number of subcarriers, frequency, bandwidth 
and permutation are the same as in before section. QAM64 
modulation with 3 A coding rate is used for all users whose 
distance from base station is the same according to Fig. 13. 

Three types of services have been defined for users, ertPS, 
rtPS and nrtPS. All users have constant data rates for ertPS 
defining 16kbps with 100 bytes packet size. Data rate for rtPS 
are 56kbps with 400 bytes packet size and in nrtPS data rate is 
120kbps with 800 bytes packet size. Users randomly get one 
of the services in a network. After limitation made on 
resources by admission control, maximum users that can be 
supported with this configuration are 72 users. We force 
admission control to cross the line and accept 80. We have 
investigated throughput of the system in different points with 
distinctive number of users, as shown in Fig. 14. 











Fig. 13. Admission control Analysis scheme in OPNET 
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Fig. 14. Average throughput for different class of service 

After increasing 70 users to 80 users, throughput badly 
degrades because admission control accepted more than 
system's supported capacity and scheduler cannot properly 
allocate. It shows a high relationship between scheduler and 
admission control unit. Simulation results are for 15 minutes. 



V. Conclusions and Future Study Issues 

PF and OFPF are suitable for BE service class but cannot 
guarantee QoS. They have high delay and low throughput 
rather than QoS guarantee algorithms. Since M-LWDF and 
UEPS do not consider buffer state and channel condition 
exactly they have lower performance to proposed scheme. 
Original UEPS uses one graph to represent emergency but in 
our scheme to remove inter-class scheduling in delay and 
buffer state functions we use 2 figures. We have also much 
better considered channel condition by CINR and RSSI, in 
making a decision. Simulation results show better delay and 
high throughput for proposed scheme with the near to same 
packet loss ratio equal to MAX-CINR which is the highest in 
throughput. Admission control evaluated as important unit in 
good performance of scheduling and almost estimate channel 
capacity more precisely. 

Power management mechanism which examines Idle and 
sleep state in increasing delay is not considered in our work. 
In real world power saving and battery consideration is 
necessary. Fairness index is another aspect that can be studied 
in this kind of works to properly compare with PF-Based 
scheme. Other great approaches are smart antenna and MIMO- 
OFDM which are highly used in 4G wireless system and make 
scheduling harder in resource allocation. A lot of matters 
could be regarded in resource allocation to make it better but 
they may also create more complexity. 
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Abstract — Security and privacy issues of the transmitted data 
have become an important concern in multimedia technology. 
Watermarking which belong to the field of information hiding 
has seen a lot of research interest recently. Watermarking is used 
for a variety of reasons including security, content protection, 
copyright management, trust management, content 
authentication, tamper detection and privacy. Recently many 
watermarking techniques have been proposed to support these 
applications but one major issue with most of the watermarking 
techniques is that these techniques fail in the presence of severe 
attacks. This has been a major threat to content providers 
because if the digital content is dramatically changed then it 
would be difficult to prove the existence of a watermark in it and 
consequently its ownership. To tackle this security threat towards 
ownership issues in this paper, we propose a computationally 
efficient and secure two quantization based watermarking 
algorithms which offer incredible performance in presence of 
malicious attacks which try to remove ownership information. 
The performance of the proposed techniques is compared with 
that of other watermarking techniques and it gives a very good 
perceptual quality especially at lower bit rates. We present 
experimental results which show that the proposed techniques 
outperform many techniques for multimedia over wireless 
applications. The proposed schemes are backed up with excellent 
results. 

Keywords-component; Watermark Detection; Watermarking; 
DCT; DWT; Quantization 

I. INTRODUCTION 

Watermarking is a method of hiding proprietary 
information in digital media like photographs, digital music, or 
digital video. The ease with which digital content can be 
exchanged over the Internet has created copyright 
infringement issues. Copyrighted material can be easily 
exchanged over peer-to-peer networks, and this has caused 
major concerns for those content providers who produce these 
digital contents. In order to protect the interest of the content 
providers these digital contents can be watermarked. 

The process of embedding a watermark in a multimedia 
object is termed as watermarking. A Watermark can be 
considered as a kind of a signature, which reveals the owner of 
the multimedia object. Content providers want to embed 
watermarks in their multimedia objects (digital content) for 
several reasons like copyright protection, content 
authentication, tamper detection etc. A watermarking 
algorithm embeds a visible or invisible watermark in a given 



multi-media object. The embedding process is guided by use 
of a secret key, which decides the locations within the 
multimedia object (image) where the watermark would be 
embedded. Once the watermark is embedded it can experience 
several attacks because the multimedia object can be digitally 
processed. The attacks can be unintentional (in the case of 
images, low pass filtering or gamma correction or 
compression) or intentional (like cropping). Hence, the 
watermark has to be very robust against all these possible 
attacks. When the owner wants to check the watermarks in the 
possibly attacked and distorted multimedia object, s/he relies 
on the secret key that was used to embed the watermark. Using 
the secret key, the embedded watermark sequence can be 
extracted. This extracted watermark may or may not resemble 
the original watermark, because the object might have been 
attacked. 

Hence, to validate the existence of a watermark, either the 
original object is used to compare and ascertain the watermark 
signal (non-blind watermarking), or a correlation measure is 
used to detect the strength of the watermark signal from the 
extracted watermark (blind watermarking). In correlation 
based detection, the original watermark sequence is compared 
with the extracted watermark sequence, and a statistical 
correlation test is used to determine the existence of the 
watermark. 

A. Requirements of Digital Watermarking 

There are three main requirements of digital 
watermarking. They are transparency, robustness and 
capacity. 

Transparency or Fidelity, The digital watermark should 
not affect the quality of the original image after it is 
watermarked. Cox et al. (2002) defines transparency or fidelity 
as 'perceptual similarity between the original and the 
watermarked versions of the cover work' [1]. Watermarking 
should not introduce visible distortions because if such 
distortions are introduced it reduces the commercial value of 
the image. 
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Robustness, Cox et al. (2002) defines robustness as the 
'ability to detect the watermark after common signal 
processing operations' [1]. Watermarks could be removed 
intentionally or unintentionally by simple image processing 
operations like contrast or brightness enhancement, gamma 
correction etc. Hence watermarks should be robust against a 
variety of such attacks into four basic categories, attacks that 
try to remove watermarks totally, attacks that try to remove 
the synchronization between the embedder and the detector, 
cryptographic attacks and protocol attacks. 

Capacity or Data Payload, Cox et al. (2002) define 
capacity or data payload as 'the number of bits a watermark 
encodes within a unit of time or work' [1]. This property 
describes how much data should be embedded as a watermark 
to successfully detect during extraction. Watermark should be 
able to carry enough information to represent the uniqueness 
of the image. Different applications have different payload 
requirements [1]. 

Security, according to Kerckhoff s principle the security 
of a cryptosystem depends on the secrecy of the key and not 
on the cryptographic algorithm. Same rule applies to water- 
marking algorithms, i.e. the watermarking algorithms must be 
public but watermark embedding should base on a secret key 
[2]. 

To prevent image manipulations and fraudulent use of 
modified images, the watermark should survive modifications 
introduced by random noise or compression, but should not be 
detectable from non- authentic regions of the image. The 
original image cannot be used by the watermark detect or to 
verify the authenticity of the image. In this paper, we 
investigate the application of a recently developed 
quantization based watermarking scheme to image 
authentication. The two proposed watermarking techniques 
allow reliable blind watermark detection from a small number 
of pixels, and thus enable the detection of local modifications 
to the image content. 

II. HISTOGRAM EQUAL AREA DIVISION 
QUANTIZATION TECHNIQUE 

The technique calculates the quantization levels using a 
method that is dependent on the image content (hence the 
word "adaptive") and then round off the pixels values to the 
nearest quantization level. In this way, the number of 
transmitted values is reduced. The quantization scheme 
provides a wide range of compression ratios (CRs) with a very 
slight degradation of the signal-to-noise ratio (SNR). 

HEAD is a quantization technique in which the 
transmitted values are reduced by mapping the values of 
image pixels to a finite number of quantization levels. 

The HEAD quantization procedure can be listed as follows: 

1 . The area under the histogram of the image pixels is divided 
into a number of vertical slices with equal areas. Thus each 
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slice has a width that is inversely proportional to its height. 
The number of these slices is equal to the number of 
quantization levels. 



2. On the horizontal axis of the sliced histogram, each slice 
has start and end points. The midpoint value (on the width) 
of each slice is considered as a quantization level. 

3. In this way, we get a non-uniform quantization in which 
the density of the quantization levels increases in 
proportion to the probability of occurrence of the pixel 
value. 

4. All the pixel values that lie within the width of a slice are 
mapped to the quantization level that is represented by the 
midpoint of this slice. 

The resultant compression ratio and signal-to-noise ratio 
vary depending on the chosen number of quantization levels. 

This technique is irreversible, i.e. the quantized values 
can't be converted back to their original values leading to 
information loss. 

III. DCT PROPOSED WATERMARKING TECHNIQUE 

The first proposed watermarking scheme is a blind 
quantization based scheme [4]. A block diagram detailing its 
steps is shown in Fig. 1. The input N*M image; an image 
assumed to be a matrix has length of N rows and width of M 
columns, is first converted into single vector by concatenating 
successive rows beside each other to form a long row that 
contains all the image pixels using matrix to vector converter. 
This vector is exposed to DCT [5]- [7] to transform the image 
from spatial domain into frequency domain in which energy of 
the image information is concentrated in a few number of 
coefficients. The output of the DCT process is a vector that 
has the same length of the image) number of pixels in the 
image), but with many values approximated to zeros. After 
applying the DCT the output coefficients are arranged in a 
descending order according to the pixels probabilities. The 
output vector of the DCT is now ready to be processed by the 
histogram equal area quantization technique to choose the 
appropriate values used in the watermark embedding process, 
quantization levels. The watermarked coefficients vector is 
reshaped and returned back to the spatial domain using IDCT. 



NxN 
Input image 



Owner seed 



DCT 



Binary watermark 
01010001110 



Embedding via 
quantization 



IDCT 



Watermarked 
Image 



Figure 1 . The first proposed image watermarking scheme. 
A Watermark Embedding 

The steps of watermark embedding can be summarized as 
follows: 
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1. The host image is transformed into the DCT domain; the 
transformed coefficients are watermarked using HEAD 
quantization using 4 quantization levels t , ti, t 2 , and t 3 . 

2. A binary watermark of the same size as the image of 
interest is created using a secret key, which is a seed of a 
random number generator. 

3. Each W-- of the selected DCT Coefficients is quantized. 

The quantization process can be summarized as follows: 

If X.. = 1 and W*. > 0, then W- s = t 2, 
y y J 

If X-. = and W-. > 0, then W /; s = tl, 



If x tJ = 1 and W*. < 0, then W /; . = _t 3 , 

If X.. = and w£ < 0, then W- s = _t . 
y y J 



(i) 



s ' s 

Where X-. the watermark is bit corresponding to W-- , and W^- 

is the watermarked coefficient. After all the selected 
coefficients are quantized, the inverse discrete cosine 
transform (IDCT) is applied and the watermarked image is 
obtained. 

B. Watermark Detection 

1 . The possibly corrupted watermarked image is transformed 
into the DCT domain as in the embedding process. 

2. The extraction is performed on the coefficients. 

3. All the coefficients of magnitude equal to ti, t 2 , - 1 3 and - 1 

's 

are selected; these are denoted W fJ . .The watermark bits 

are extracted from each of the selected DCT coefficients 
with Eq.2. Fig. 2 illustrates the watermark detection 
process. 
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No watermark was inserted into the low-pass sub-band. Unlike 
some non-blind watermarking schemes [9] [10], this scheme 
allows a watermark to be detected without access to the 
original image. It performs an implicit visual masking as only 
wavelet coefficients with large magnitude are selected for 
watermark insertion. These coefficients correspond to regions 
of texture and edges in an image. This scheme makes it 
difficult for a human viewer to perceive any degradation in the 
watermarked image. Also, because wavelet coefficients of 
large magnitude are perceptually significant, it is difficult to 
remove the watermark without severely distorting the 
watermarked image. The most novel aspect of this scheme was 
the introduction of a watermark consisting of pseudorandom 
real numbers. Since watermark detection typically consists of 
a process of correlation estimation, in which the watermark 
coefficients are placed in the image, changes in the location of 
the watermarked coefficients are unacceptable. The 
watermarking scheme proposed by Dugad et al. is based on 
adding the watermark in selected coefficients with significant 
energy in the transform domain in order to ensure the non- 
erasability of the watermark. This scheme has overcome the 
problem of "order sensitivity". 
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Figure 2. Watermark detection in the proposed scheme. 
If |Wy | = t 2 or t 3 , then the recovered watermark bit is a 1 . 

's 

If %• I = t or ti, then the recovered watermark bit is a 

(2) 
4. The recovered watermark is then correlated with the 
original watermark in the watermark file, obtained via the 
secret key. This allows a confidence measure to be 
ascertained for the presence or absence of a watermark in 
an image. 

IV. DWT WATERMARKING TECHNIQUE 

Dugad et al. presented a blind additive watermarking 
scheme operating in the wavelet domain [8]. Three-level 
wavelet decomposition with Daubechies 8-tap filters was used. 



Unfortunately, this scheme has also some disadvantages. It 
embeds the watermark in an additive fashion. It is known that 
blind detectors for additive watermarking schemes must 
correlate the possibly watermarked image coefficients with the 
known watermark in order to determine if the image has or has 
not been marked. Thus, the image itself must be treated as 
noise, which makes the detection of the watermark 
exceedingly difficult [8]. In order to overcome this problem, it 
is necessary to correlate a very large number of coefficients, 
which in turn requires the watermark to be embedded into 
several image coefficients at the insertion stage. As a result, 
the degradation in the watermarked image increases. Another 
drawback is that the detector can only tell if the watermark is 
present or not. It cannot recover the actual watermark. 

The scheme in [11] is another example of wavelet-based 
watermarking schemes. A noise-like Gaussian sequence is 
used as a watermark. To embed the watermark robustly and 
imperceptibly, watermark components are added to the 
significant coefficients of each selected sub-band by 
considering the human visual system (HVS) characteristics. 
Some small modifications are performed to improve the HVS 
model. The host image is needed in the watermark extraction 
procedure. 

V. PROPOSED DWT WATERMARKING TECHNIQUE 

Discrete wavelet transform is a technique using which a 
2D image can be transferred from spatial domain to frequency 
domain. The input N*M image; an image assumed to be a 
matrix has length of N rows and width of M columns, is 
exposed to wavelet transform. After one level DWT an image 
I is decomposed into four subbands LL, HL, LH, and HH. LL 
is called the approximate band and it contains most of the 
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energy. In the algorithm we decompose the image into four 
levels and embed the watermark in HL, LH sub-bands. Here 
we assume the size of the watermark logo is in multiple of the 
sub-band size. In the second proposed a quantization based 
watermarking algorithm, we incorporate implicit visual 
masking by embedding the watermark in the LH, HL sub- 
bands. The output vector of the wavelet is now ready to be 
processed by the histogram equal area quantization technique 
to choose the appropriate values used in the watermark 
embedding process, quantization levels. The watermarked 
coefficients vector is reshaped and returned back to the spatial 
domain using IDWT. 
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4. After all the selected coefficients are quantized, the 
inverse discrete wavelet transform (IDWT) is applied and 
the watermarked image is obtained. 

B. Watermark Detection 
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Figure 3. The proposed image watermarking scheme. 
A. Watermark Embedding 

The steps of watermark embedding can be summarized as 
follows: 

1 . The host image is transformed into the wavelet domain; one 

level Daubechies wavelet with filters of length 4 is used. 
The coefficients (excluding the LLi and HHI) 
coefficients are watermarked using HEAD quantization 
using 4 quantization levels t , ti, t 2 , and t 3 . 

2. A binary watermark of the same size as the subbands of 

interest is created using a secret key, which is a seed of a 
random number generator. 

3. Each W-. of the selected wavelet coefficients is quantized. 

The quantization process can be summarized as follows: 



If X-. = 1 and W*. > 0, then W« 
y y J 

If X tj = and W*. > 0, then W /; s 

If X.. = 1 and W-. < 0, then wj? 



If 
(3) 



X tj = and 



W,, 



0, 



= 12, 
= tl, 
= -t 3 , 

then 



w,; 



= -to- 



Where X-. the watermark is bit corresponding to W-- , and 

's 

W fj - is the watermarked wavelet coefficient. Figure (3) 

shows the watermark embedding in a positive wavelet 
coefficient. 



1. The possibly corrupted watermarked image is 
transformed into the wavelet domain using the same 
wavelet transform as in the embedding process. 

2. The extraction is performed on the coefficients in the first 
level wavelet transform (excluding the LLi subband). 

3. All the coefficients of magnitude equal to ti, t 2 , - t 3 and - t 

's 

are selected; these are denoted W^. .The watermark bits are 

extracted from each of the selected DCT coefficients with 
Eq.4. Fig. 4 illustrates the watermark detection process. 
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Figure 4. Watermark detection in the proposed scheme. 
If |% | = t 2 or t 3 , then the recovered watermark bit is a 1 . 

If |% = to or ti, then the recovered watermark bit is a 
(4) 

4. The recovered watermark is then correlated with the 

original watermark in the watermark file, obtained via the 
secret key. This allows a confidence measure to be 
ascertained for the presence or absence of a watermark in 
an image. 

5. The recovered watermark is then correlated with the 
original watermark in the watermark file, obtained via the 
secret key, only in the locations of the selected 
coefficients. This allows a confidence measure to be 
ascertained for the presence or absence of a watermark in 
an image. 

VI. PERCEPTUAL QUALITY METRICS 

Two metrics for ascertaining the quality of a watermarked 
image are highlighted in this section. These metrics are the 
Mean Square Error (MSE), and the Peak Signal to Noise Ratio 
(PSNR). The MSE measures the average pixel-by-pixel 
difference between the original image (I) and the watermarked 

image (/) [12]. 



A/TAT" ^ V m >" m ' n/ 



(5) 



MN 



PSNR (dB ) = 10 log 



peak 

MSE 



(6) 
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Where I peak is the peak intensity level in the original image 

(most commonly 255 for an 8-bit grayscale image), M and N 

are the dimensions of the image. 

The original and recovered messages or watermarks can be 

compared by computing the Normalized Correlation (NC) 

[12]: 



JVC 



(7) 



Where m is the original message and m is the recovered 
message. For unipolar vectors, me {0, 1}, and for bipolar 
vectors, m e {-1, 1}. 

VII. Simulation Results 



For all the tests in this paper, MATLAB is used. All tests 
are performed upon the 8-bit grayscale 256 x 256 cameraman 
image. To simulate the watermarking schemes on the 
cameraman image, the four quantization levels are T0=113; 
T1=124;T2=156;T3=159. 

Results of the two schemes for the cameraman image are 
shown in Fig. 5 and Fig. 6, respectively. The comparison of 
fidelity is shown in Table I. The numerical evaluation metrics 
for all schemes in the absence and presence of attacks are 
tabulated in Tables II. From Table II, we notice that the 
proposed watermarking scheme achieves the lowest distortion 
in the watermarked image in the absence of attacks we find 
that the proposed using wavelet give the image with fidelity 
better than the tech using DCT. From Table II it gives the 
comparison between our technique using DCT and wavelets, 
we notice also that a percentage of around 50% of the input 
watermark bits can be extracted in the proposed scheme with 
most of the attacks. 

In the case of DCT we find that we can detect watermark 
at the presence of blurring, Gaussian or compression attack, in 
the case of wavelet we can detect the watermark at the 
presence of Gaussian, resizing, blurring or compression attack. 
We compare our results to daugads [8], LSB technique [9] and 
the technique in [4]. 

In the case of LSB technique, we find it is difficult to 
detect the watermark at the case of attacks applied to the 
watermarked image. 

The technique in [4] gives better result than the existed 
technique and the proposed one in the case of compression. 
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Figure 5. Watermarked image using proposed technique with DCT 
with and without attacks. 
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TABLE II. COMPARISON OF NC OF THE EXTRACTED 
WATERMARKS FOR OUR SCHEME FOR THE CAMERAMAN IMAGE 
AND THE OTHER EXISTING TECHNIQUES. 



(b) Waiermarkal imaga PSNR- 327 dB 




(di Popping 




jt) blurring 




{hi smprs&Man 



Figure 6. Watermarked image using the proposed DWT technique 
with and without attacks. 

TABLE I. EVALUATION METRICS VALUES FOR ALL 
SCHEMES FOR THE CAMERAMAN IMAGE. 
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VIII. CONCLUSION 

This paper presented a blind DCT -DWT based image 
watermarking schemes. These schemes depend on the 
quantization of coefficients within certain amplitude ranges in 
a binary manner to embed meaningful information in the 
image. Experimental results have shown the superiority of the 
proposed schemes from the host image quality point of view 
and the blindness point of view. 

References 

[1] Cox, I J, Miller, ML & Bloom, JA 2002, Digital Watermarking, 
Morgan Kaufmann Publisher, San Francisco, CA, USA. 

nd 

[2] Schneier, B., 'Applied Cryptography', WILEY, 2 Edition. 

[3] Shaimaa A. El-said, Khalid F. A. Hussein, and Mohamed M. 
Fouad, "Adaptive Lossy Image Compression Technique," 
Electrical and Computer Systems Engineering Conference 
(ECSE'10),2010. 

[4] Mohiy Mohammed hadhoud , Abdalhameed shaalan, hanaa 
abdalaziz abdallah "A Modified Image Watermarking Using 
Scalar Quantization in Wavelet Domain" UbiCC Journal, 
Volume 4, Number 3, August 2009 

[5] A. S. Khayam, The Discrete Cosine Transform Theory and 
Application, Michigan State University ,March 10th 2003 . 

[6] A. B. Watson, Image Compression Using the Discrete Cosine 
Transform, Mathematica Journal, 4(1), 1994 ,p. 81-88. 

[7] D.A. Huffman, A method for the construction of minimum- 
redundancy codes. Proc. Inst. Radio Eng. 40(9), pp. 1098-1 101, 
1952. 

[8] K. Dugad, R. Ratakonda, and N. Ahuja, "A New Wavelet-Based 
Scheme for Watermarking Images," Proceedings of 1998 
International Conference on Image Processing (ICIP 1998), 
Vol. 2, Chicago, IL, October 4-7, 1998, pp. 419-423. 

[9] M. Corvi and G. Nicchiotti, 'Wavelet-based image watermarking 
for copyright protection, Scandinavian Conference on Image 
Analysis," SCIA '97, Lappeenranta, Finland, June 1997, 157- 
163. 



246 



http://sites.google.com/site/ijcsis/ 
ISSN 1947-5500 



(IJCSIS) International Journal of Computer Science and Information Security, 

Vol. 9, No. 2, February 2011 

[10] P. Meerwald, Digital image watermarking in the wavelet [11] S. Voloshynovskiy, S. Pereira, V. Iquise, and T. Pun. "Attack 

transform domain, Master thesis, Department of Scientific modeling: Towards a second generation watermarking 

Computing, University of Salzburg, Austria, 2001 . benchmark" Journal of Signal Processing, 80 (6) , May 2001 . 

http://www.cosy.sbg.ac.at/~pmeerw/Watermarking/ [12] C. Shoemaker, Rudko, "Hidden Bits: A Survey of Techniques 

for Digital Watermarking" Independent StudyEER-290 Prof 
Rudko, Spring 2002. 



247 http://sites.google.com/site/ijcsis/ 

ISSN 1947-5500 



(IJCSIS) International Journal of Computer Science and Information Security, 

Vol. 9, No. 2, 2011 



Robust Techniques of Web Watermarking 

Using Verbs, Articles and Prepositions 



Nighat Mir 
College of Engineering 

Effat University 

Jeddah, Saudi Arabia 

nighat mir@hotmail.com 



Abstract — Internet is an attractive, rapid and economical way of 
electronic information distribution. With advent and tremendous 
growth of Internet, information is going paperless and is 
transforming into electronic information over the paper 
distribution. 

But it also makes protection of its intellectual property very 
difficult. Once the information is available on the Internet, it's 
open to any threats like illegal copying, distribution, tampering 
and authentication. Intellectual rights for the information 
available on web are a serious issue. 

In this paper natural language digital watermarks are proposed 
for the web based electronic data. And a problem of investigating 
the authorship of web based text/data is investigated with a 
improved security. Several robust techniques of web page 
imperceptible digital watermarking using Verbs, Articles and 
Prepositions are studied for the protection of content available on 
www. On this basis, web watermarking algorithm is designed and 
implemented. A key consisting of natural watermarks along with 
a unique author id (issued by the CA) is integrated to any content 
to be published on the web. The key to be integrated is further 
encrypted suing AES (Advanced Encryption Standard) to add 
another layer of security. And it is also tested with different web 
sites to see its functionality and robustness. 

Keywords- Digital Watermarking, Verbs, Articles, Prepositions, 
encryption, HTML, AES, CA. 



I. 



Introduction 



Internet is an attractive, rapid and economical way of 
electronic information distribution. With advent and 
tremendous growth of Internet, information is going paperless 
and is transforming into electronic information over the paper 
distribution. 

But it also makes protection of its intellectual property very 
difficult. Once the information is available on the Internet, it's 
open to any threats like illegal copying, distribution, tampering 
and authentication. Intellectual rights for the information 
available on web are a serious issue. 

Different techniques are used for securing information like 
steganography, cryptography and watermarking but adopting 
different ways. Steganography hides the existence of 
information and makes it imperceptible for a viewer. A cover 
medium is used as a carrier in which secret data is embedded 
that the intended recipient is the only one to know the existence 
of secret message [1]. 



Cryptography encrypts the information using a key and the 
party having a key can only decrypt and reveal the message. 
So, people are aware of an existence of some hidden 
communication. It makes data unreadable by writing into secret 
code and it ensures authentication, confidentiality and integrity 
[2]. 

Where, watermarking is a process of embedding secret 
information into a digital signal to identify the owner of that 
media [3]. 

In this paper, several robust techniques of web page digital 
watermarking using common Verbs, Articles and Prepositions 
are studied for the protection of content available on www. On 
this basis, web watermarking algorithm is designed and 
implemented. And it is also tested with different web sites to 
see its functionality, robustness and the capacity. 

Internet contains different types of data i.e. image, video, 
audio and text. Based on this organization digital watermarking 
may be classified as image watermarking, video watermarking, 
audio watermarking, and text watermarking. But the basic 
principles are motives are same to secure the information 
against different threats. Unauthorized copying, propagation 
and tampering are very common attacks and are difficult to 
overcome. A lot of research has been done on different types of 
data but web based text has not been highlighted in this effect. 

In view of the fact that digital contents are easy to copy or 
process, they are likely to be wrongly used. A digital 
watermarking method is one of the efficient countermeasures 
against such wrongness and can be categorized into perceptible 
and imperceptible techniques. Many perceptible techniques 
have been studied for the text but few imperceptible techniques 
are available for the electronic text. 

Digital watermarking is proved to be a mode of 
identification for the creator, owner or distributor of data. Its 
aim is to make the data beyond dispute. In case of illicit use, 
the watermark facilitates the claim of ownership and successful 
examination. It makes large scale distribution simple and 
economical. 

Hyper Text Markup Language (HTML) is used by web 
browsers to understand, interpret and structure text, image and 
other types of data. All web browsers have the default 
characteristics of every item of HTML. Web developers can 
use different languages and tools to create web pages but these 
are further interpreted into HTML by all the web browsers. 
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Hence, HTML is a basic building block of web pages but the 
general source code of these pages is easily available on a 
single right-click of view source. Any data in general and text 
in particular is open to many threats and attacks. It is observed 
that intentionally or unintentionally illegal copying of data 
from the internet has become a universal practice and has a 
great effect on the privacy of information and copyright is no 
more an optimal solution. Digital Watermarking methods are 
considered a strong mechanism to identify the original owner 
and to prove the intellectual property. Imperceptible digital 
web page watermarking techniques can provide solutions for 
the intellectual property of content available on these pages. 

In Digital watermarking a hidden marker is embedded to 
the data which is generally un-observable and can be only 
drained by special detector. The goal is not to change the 
original characteristics but to use the human's insensitive 
perceptual organs. 

With the ever increasing growth of internet users all over 
the world, it is very important to secure the web pages and its 
content. Unlike other forms of carriers, there is a wide 
bandwidth present in web pages for information hiding or 
embedding watermarks and many robust techniques can be 
developed for web page watermarking. Web page 
watermarking is to achieve the integrity of web pages which is 
a very popular and rich source of information. 



II. 



Realted Work 



J. Wu and D.R in [4] have proposed APS Authorship Proof 
Scheme based on natural language watermarks. A predefined 
security level has been defined and as long as it is less than the 
probability measure and is considered secure. They have 
proposed a solution for catering long text and are robust. They 
have used meaning and literal representations to embed 
watermarks and have also used edit distance against fault 
tolerance. 

Qijun Zhao, Hondtao Lu [5] have proposed scheme for the 
tamper proof web pages in which watermarks are generated on 
the basis of the Principal Component Analysis (PCA) 
technique. Upper and lower cases are considered for 
embedding watermarks in to HTML tags. 

Fei, Wang, Zhand and Li in [6] have presented a 
watermarking scheme to embed different fingerprints in XML 
data which can be used to trace illegal distribution. Their 
scheme attempts to reduce the modification attack and 
maintains the robustness level. 

Shi, Kim and S. in [7] have studied approaches for secure 
embedding and detection of a watermark in an un-trusted 
environment. They have considered Zero-Knowledge 
Watermark Detection (ZKWMD) protocols for authorship 
proof and a Chameleon-like stream cipher that achieves 
simultaneous decryption and fingerprinting of data tracing 
illegal distribution of broadcast messages. 

Some further techniques have also been proposed in [8] and 
[9] based on HTML web files. Mohammed and Sun in [8] have 
proposed some digital watermarking techniques for HTML 
pages where they have focused on exploiting white space, line 
breaks, attributes ordering, string delimiter and color values. 
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All above mentioned techniques were just proposed and not 

implemented however, some of these have been tested to show 

sample results. Ala' a and Mazin in [9] have also used HTML 

files to achieve secret communication. They have exploited 

white space to hide a secret data in an HTML file and have 

further encrypted by using colored data by using Data 

Encryption Standard Algorithm. 



Wu, Jiwu, Huang, and Shi in [10] have proposed a self- 
synchronization algorithm for audio watermarking to facilitate 
assured audio data transmission. The synchronization codes are 
embedded into audio with the informative data, thus the 
embedded data have the self-synchronization ability. They 
have embedded the codes and hidden informative data into the 
low frequency coefficients in DWT (discrete wavelet 
transform) domain. 



Hasan in [11] have explored the morpho-syntactic tools for 
text watermarking and develops a syntax-based natural 
language watermarking scheme for Turkish language. The 
unmarked text is first transformed into a syntactic tree diagram 
in which the syntactic hierarchies and the functional 
dependencies are coded. The watermarking software then 
operates on the sentences in syntax tree format and executes 
binary changes under control of Word-net to avoid semantic 
drops. 



Chang and Clark in [12] have described a method for 
checking the acceptability of paraphrases in context. They have 
used Google n-gram data and a CCG parser to certify the 
paraphrasing grammaticality and fluency. In which they have 
collected the human findings for the evaluation and have 
integrated text paraphrasing into a Linguistic Steganography 
system, by using paraphrases to hide information in a cover 
text. 

Zhu and Sang in [13] watermarking programs based on the 
discrete cosine transform (DCT) domain DC component (DC) 
has been adopted. Through adjusting the block DCT coefficient 
of the image the watermarks are hidden. And blocking the 
selected image according to 8x8 pixel, then dividing the 
selected image into four non-overlapped sub image blocks 
according to 4x4 pixel, and thus the watermarks are embedded 
through adjusting their DCT coefficient. 



Kim, Moon and Oh in [14] have proposed an idea of using 
word classification and inter word space statistics. They have 
segmented the words to add information in to text content by 
modifying the statistics of inter word space. 



Meral, Unkar, Sankor, OZ and Gunor in [15] have explored 
the morphosyntatic tools for text watermarking and have come 
up with a syntax based natural language watermarks. They 
have developed the system for Turkish language, in which 
syntax free format sentences are executed into binary changes 
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under wordnet to avoid semantic drops. The algorithm 
transforms the raw sentences into their Treebank representation 
and syntactic tree by randomizing their occurrences. 



III. System Models 



i Embedding Water Mark using Key 
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Figure 1: Embedding Phase 



Extracting Water Mark using Key 
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paper, the copyright conventions to be integrated are studied in 

light on English grammatical rules (Verbs, Articles and 

Prepositions) which are the structural part of any text. The 

articles, verbs and prepositions (natural language watermarks) 

used in this research come under most common and first 100 

words in English in frequency order. And that make up about 

half of all the written material. Below there is a composite table 

as well as separate tables with respect to their frequencies. 

To publish and keep the copyrights a key is given to an 
author so that whenever an author publishes something on web, 
he/she needs to integrate this key along with the content to be 
published. Key is the main part mart and it constitutes of many 
things. To make a key first need to have a unique author id 
from the CA (Certified Authority) and then natural watermarks 
are added to this author id to make a key. 

Key = ( £Fength=i(A + V + P) + £f e "nght=i AID) (1) 

Where 

A=Articles 

V=Verbs 

P=Prepositions 

AID = Author ID 

Length= size of author id and watermarks 

Natural Language Watermarks (NLW) are extracted from 
the content. Depending on the numbers of these NLW and key 
will be constructed. Each time a different key can be generated 
for the publishing but with the same author id as its uniquely 
generated. So far the size of key and author id is not restricted 
to any specific length but can be taken into consideration. 

CA can be a registered company issuing ID's or can also be 
regulated by the website owners. 

So, in brief a unique author id is concatenated with three 
sets of natural watermarks (verbs, articles and prepositions) to 
generate a secret key which is further encrypted using a 
cryptographic algorithm AES (Advanced Encryption Standard) 
before adding it to a webpage. 

The sets of natural watermarks used are: 

A. List of most frequently used verbs in English: 



List of 
Verbs 


Letter/s and values 


Letter/s 


Frequency 


is 


15% 


are 


34% 



TABLE 1: Verbs and Frequencies 

B. List of most frequently used indefinite articles in English: 



Figure 2: Extraction Phase 



IV. Proposed Methodology 

When an author/writer contributes his/her text to the web, 
then one needs to protect his/her intellectual rights. In this 



List of 
articles 


Letter/s and values 


Letter/s 


Frequency 


a 


15% 


an 


23% 



TABLE 2: Articles and Frequencies 
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C. List of most frequently prepositions in English: 
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Generated encrypted secret key to be embedded in HTML 
page: 



List of 

Prepositions 


Letter/s and values 


Letter/s 


Frequency 


of 


9% 


to 


23% 


in 


15% 


for 


16% 



TABLE 3: Prepositions and Frequencies 

V. Implementation Details 

The proposed system has been implemented in C# language 
using Visual Studio.net framework. The program works as a 
parser where it reads and checks the textual content form the 
<body> tag of an HTML page. It checks how many natural 
watermarks (verbs, articles and prepositions) are there. It then 
concatenates these natural language watermarks with the 
Author ID and then combine it generates a secret key. Author 
ID should be a unique ID for every author and usually needs to 
be assigned by the CA (Certified Authors). My program has 
also the ability to generate an author id as an individual CA of 
any website as well can take a pre-assigned id. 

The program also has an ability to generate key for 
published websites, static pages and can create also one at the 
run time. Key which is to be integrated in an HTML page is 
further encrypted using a standard AES (Advanced Encryption 
Standard) to add another layer of security. 



VI. 



Results and Analysis 



I have tested many websites and here I am showing the 
results of few websites like Wikipedia, EnglishThroughStories 
and BBC news. 

Test 1 : In Wikipedia I have searched an article (information 
security) as mentioned on the link below and found 768 natural 
watermarks, which shows that there is a big bandwidth 
available. I have an author id (nighat), which I kept same for 
different tests on different websites. 

Web link of Wikipedia, which I have used for the 
embedding cycle 

http://en.wikipedia.org/wiki/Information_security 

Table 4 shows the detail of each watermark used to 
generate a secret key. 



Author 
id: 

nighat 


Watermarks 


Letter/s 


Frequency 


is 


128 


are 


70 


of 


265 


to 


217 


in 


88 


for 


73 



liy5tiw6HB/EtyraBaZA/rhzz5xt/zpvD/zcwu8+uxd5dNQXfTlcco9tczxz/dR3mivxi 



Test 2: In EnglishThroughStories I have searched a script as 
mentioned on the link below and found 498 natural 
watermarks. I have an author id (nighat), which I kept same for 
different tests on different websites. 

Web link of EnglishThroughStories, which I have used for 
the embedding cycle 

http://www.englishthroughstories.com/scripts/scripts.html 

Table 5 shows the detail of each watermark used to 
generate a secret key. 



Author 
id: 

nighat 


Watermarks 


Letter/s 


Frequency 


is 


24 


are 


14 


of 


93 


to 


224 


in 


90 


for 


53 



TABLE 5: Watermarks of EnglishThroughStories 

Generated encrypted secret key to be embedded in HTML 
page: 

OxD5Zi jJHu+lkv2T7weHhHLmTCueqtO5SJBTdsgA/90y+Sx6w2alTW9GPrTrS4Ntw2KX/ 



Test 3: In BBC news I have searched a news article as 
mentioned on the link below and found 224 natural 
watermarks. I have an author id (nighat), which I kept same for 
different tests on different websites. 

Web link of BBC news, which I have used for the 
embedding cycle 

http://www.bbc.co.uk/news/world-middle-east-12362826 

Table 6 shows the detail of each watermark used to 
generate a secret key. 



Author 
id: 

nighat 


Watermarks 


Letter/s 


Frequency 


is 


17 


are 


10 


of 


68 


to 


57 


in 


48 


for 


24 



TABLE 4: Watermarks of Wikipedia 



TABLE 6: Watermarks of BBC news 
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Generated encrypted secret key to be embedded in HTML 
page: 



pXD5ZijJHU+lkv2T7weHhHLniTCueqtO5SJBTdsgA/907kB9UiWS4lde0B+o5zl9ZXGXJC 

A graphical view and comparisons of each set of 
watermarks with respect to the frequency of every watermark 
used in English text is shown in Figure 3. 
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Figure 3: Frequency Comparison of Watermarks 



VII. 



Conclusion 



Different natural language watermarks have been used in 
this research. Semantic based watermarks have been proposed 
using verbs (is, are), articles (a, an) and prepositions (of, in, to, 
for). The system has been implemented using C# language and 
different common websites are used for the testing purposes to 
see the effect of results. Natural watermarks are combined with 
an author id to generate a secret key to protect copyrights for a 
web page. The secret key is further encrypted using one of the 
popular and strong encryption standard AES (Advanced 
Encryption Standard). And this secured encrypted key is 
embedded with the web page for the protection of authorship 
rights. 
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Abstract — In this paper, a new wormhole-switched routing 
algorithm for irregular 2-dimensional (2-D) mesh interconnection 
Network-on-Chip is proposed, where not only no virtual channel 
is used for routing but also no virtual channel is used to pass 
oversized nodes (ONs). We also improve message passing 
parameters of ONs as well as comparing simulation results of our 
algorithm and several state of art algorithms. Simulation results 
show that our proposed algorithm, i-xy (improved/irregular-xy), 
has a higher saturation point in comparison with extended-xy 
and OAPR algorithms. Furthermore, it has less blocked messages 
and higher routed/switched messages in the network. Moreover, 
the network uses i-xy has higher utilization compared to other 
networks which uses e-xy and OAPR from 35 percent to 100 
percent, for the irregular 2-D mesh NoC. 

Keywords-Network-on-Chip, performance, wormhole switching, 
irregular 2-D mesh, routing, utilization 

I. Introduction 

As technology scales, Systems-on-Chips (SoCs) are 
becoming increasingly complex and heterogeneous. One of the 
most important key issues that characterize such SoCs is the 
seamless mixing of numerous Intellectual Property (IP) cores 
performing different functions and operating at different clock 
frequencies. In just the last few years, Network-on-Chip (NoC) 
has emerged as a leading paradigm for the synthesis of multi- 
core SoCs [1]. The routing algorithm used in the 
interconnection communication NoC is the most crucial aspect 
that distinguishes various proposed NoC architectures [2], [3]. 
However, the use of VCs introduces some overhead in terms of 
both additional resources and mechanisms for their 
management [4]. 

Each IP core has two segments to operate in 
communication and computation modes separately [5]. On-chip 
packet switched interconnection architectures, called as NoCs, 
have been proposed as a solution for the communication 
challenges in these networks [6]. NoCs relate closely to 
interconnection networks for high-performance parallel 
computers with multiple processors, in which each processor is 
an individual chip. 

A NoC is a group of routers and switches that are connected 
to each other on a point to point short link to provide a 
communication backbone of the IP cores of a SoC. The most 



common template that proposed for the communication of NoC 
is a 2-D mesh network topology where each resource is 
connected with a router [7]. In these networks, source nodes 
(an IP-Core), generate packets that include headers as well as 
data, then routers transfer them through connected links to 
destination nodes [8]. 

The wormhole (WH) switching technique proposed by 
Dally and Seitz [9] has been widely used in the 
interconnections such as [10], [11], [12], [15] and [16]. In the 
WH technique, a packet is divided into a series of fixed-size 
parts of data, called flits. Wormhole routing requires the least 
buffering (flits instead of packets) and allows low-latency 
communication. To avoid deadlocks among messages, multiple 
virtual channels (VC) are simulated on each physical link [12]. 
Each unidirectional virtual channel is realized by an 
independently managed pair of message buffers [13]. 

This paper presents a new routing algorithm for irregular 
mesh networks by base that enhances a previously proposed 
technique. The primary distinction between the previous 
method and the method presented in this paper is passing 
messages from ONs in the network. Simulation results show 
that utilization of network by e-xy and OAPR algorithm is 
worse than the improved one, i-xy. We have been simulated 
every three algorithms for 5% and 10% of oversized nodes 
with uniform and hotspot traffic. Results for all situations show 
that our algorithm has higher utilization and can work in higher 
message injection rates, with higher saturation point. 

The rest of the paper is organized as follows. In section II 
some deterministic-based routing algorithms are discussed. 
Then the new i-xy irregular routing algorithm is explained 
followed by Section III in which our experimental results are 
discussed. Finally, Section IV summarizes and concludes the 
work. 

II. Irregular Routing 

Routing is the act of passing on data from one node to 
another in a given scheme [11]. Currently, most of the 
proposed algorithms for routing in NoCs are based upon 
deterministic routing algorithms which in the case of oversized 
nodes, cannot route packets. Since adaptive algorithms are very 
complex for Network-on-Chips, a flexible deterministic 
algorithm is a suitable one [14]. Deterministic routing 
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algorithms establish the path as a function of the destination 
address, always applying the same path between every pair of 
nodes. This routing algorithm is known as dimension-order 
routing (x-y routing). This routing algorithm routes packets by 
crossing dimensions in strictly increasing (or decreasing) order, 
reducing to zero the offset in one dimension before routing in 
the next one [13]. To avoid deadlocks among messages, 
multiple virtual channels (VC) are simulated on each physical 
channel [12]. But in this paper, we use no VCs in proposed 
algorithm and introduced a deadlock and live lock-free 
irregular routing algorithm. 

Many algorithms have been suggested to operate in faulty 
conditions without deadlock and livelock. We can modify these 
algorithms to use in irregular interconnection networks. Some 
of these algorithms like [10], [11], [12], [15] and [16] are based 
on deterministic algorithms. In [15], Wu proposed a 
deterministic algorithm. This proposed algorithm uses odd- 
even turn model to pass the block faults. Also, the algorithm 
proposed by Lin et al. [16] uses above mentioned method. 
Since our proposed algorithm is similar to these algorithms 
(uses no virtual channel), in the next section, we are going to 
describe how these deterministic algorithms work and how we 
have improved them. The main idea describes in the rest of this 
section. 

A. Extended-XY Routing Algorithm 

The algorithm presented by Wu [15], extended-xy, uses no 
VCs by implementing odd-even turn model which is discussed 
in [17]. Such an algorithm is able to pass faulty ring and 
orthogonal faulty blocks. This algorithm consists two phases; 
in phase 1, the offset along the x dimension is reduced to zero 
and, in phase 2, the offset along the y dimension is reduced to 
zero [15]. 

This algorithm has two modes, normal and abnormal mode. 
The extended-xy routing follows the regular x-y routing (and 
the packet is in a "normal" mode) until the packet reaches a 
boundary node of a faulty block. At that point, the packet is 
routed around the block (and the packet is in an "abnormal" 
mode) clockwise or counterclockwise based on certain rules: 
Unlike routing in a fault-free routing, the fault-tolerant routing 
protocol has to prepare for "unforeseen" situations: a faulty 
block encountered during the routing process. This is done by 
three means: 1) the packet should reside in an even column 
when reaching a north or south boundary node of the routing 
block in phase 1. 2) In phase 1, the packet should be routed 
around the west side since, once the packet is east-bound, it 
cannot be changed to west-bound later. 3) The two boundary 
lines, one even and one odd, offer just enough flexibility for the 
packet to make turns for all situations. 

In phase 2, to route around the routing block, odd columns 
(even columns) are used to perform routing along the y 
dimension when the packet is east-bound (west-bound). The 
packet is routed around the routing block either clockwise or 
counterclockwise in phase 2. Note that during the normal mode 
of routing the packet along the x or y dimension, no 180 
degrees turn is allowed. For example, the positive x direction 
cannot be changed to the negative x direction [15]. Additional 
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information and introduced algorithm about extended-xy 
algorithm can be found in [15]. 



B. OAPR Routing Algorithm 

The algorithm presented by S.Y. Lin et al. [16], OAPR, 
described as follows: 

1) Avoid routing paths along boundaries of ONs. In the 
environment of faulty meshes, we can only know the 
information of faulty blocks in real-time. However, the 
locations of ONs are known in advance. Therefore, the OAPR 
can avoid routing paths along boundaries of ONs and reduce 
the traffic loads around ONs. 

2) Support f-rings and f-chains for placements of ONs. The 
OAPR solves the drawbacks of the e-xy and uses the odd-even 
turn model to avoid deadlock systematically. However, the e- 
xy cannot support ONs placed at boundaries of irregular 
meshes. In order to solve this problem, the OAPR applies the 
concepts of f-rings and f-chains [12]. With this feature, the 
OAPR can work correctly if ONs are placed at the boundaries 
of irregular meshes. Additional information and introduced 
algorithm about extended-xy algorithm can be found in [16]. 

C. Improved-XY Routing A Igorithm 

This algorithm is based on if-cube2 [10], [11], and similar 
to extended-xy [15], OAPR algorithm [16] and odd-even turn 
model [17] uses no virtual channel. Like extended-xy 
algorithm, able to pass ring blocks of oversized nodes and also 
chain blocks that not considered in extended-xy routing. 
Moreover, when a network uses OAPR algorithm, all ONs 
vertically overlapping must be aligned on the east edge, but in 
improved-xy this constraint has been removed. Like [11] each 
message is injected into the network as a row message and its 
direction is set to null until it reaches to the column of the 
destination node. Then it would be changed as a column 
message to reach the destination. A column message could not 
change its path as a row message, unless it encounters with 
oversized region. In such a situation, a column message could 
change its direction into clockwise or counter-clockwise. First, 
each message should be checked if it has reached to destination 
node. Else, if this message is a row message and has just 
reached to the column of destination node, it would be changed 
as a column message. 

For regular meshes, the e-cube provides deadlock-free 
shortest path routing. At each node during the routing of a 
message, the e-cube specifies the next hop to be taken by the 
message. The message is said to be blocked by an oversized 
node, if its e-cube hop is on an oversized region. The proposed 
modification uses no virtual channels and tolerates multiple 
oversized blocks. 

To route messages around rings or chains, messages are 
classified into four types: East- to- West (EW), West- to-East 
(WE), North-to-South (NS), or South-to-North (NS). EW and 
WE messages are known as row messages and NS and SN as 
column messages. A message is labeled as either an EW or WE 
message when it is generated, depending on its destination. 
Once a message completes its row hops, it becomes a NS or a 
SN message to travel along the column. Thus, row messages 
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can become column messages; however, NS and SN messages 
cannot change their types. 

Next, if a message encountered with an oversized region, 
the Set-Direction(M) procedure would be called to set the 
direction of the message. The role of this procedure is to pass 
oversized region by setting the direction of message to 
clockwise or counter-clockwise. Again, the direction of the 
message will be set to null when it passed oversized region. 
While the direction of a message is null, e-cube algorithm used 
to route messages and it can be use odd/even row/columns. Fig. 
1 show the using of odd and even row and columns when a 
message is passing an oversized node. 

Using this modification of passing oversized regions, 
simulations are performed to evaluate the performance of the 
enhanced algorithms in comparison with the algorithms 
proposed in prior work. Simulation results indicate an 
improvement in the utilization and switched/routed messages 
for different cases of ONs, and different traffics. Furthermore, 
the enhanced approach can handle higher message injection 
rates (i.e., it has a higher saturation rate). In the following of 
this section, the proposed algorithm, Improved-XY(i-xy), and 
Set-Direction(M) procedures, have been given. 
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Figure 1 . Usage of odd and even row or columns. 
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Algorithm Improved-XY (i-xy) 

/* the current host of message M is (si 
, sO) and its destination is (dl , do) . 
*/ 

0. If si = dl and sO = dO , consume M and 
return. 

1. If M is a row message and sO = dO 
then change its type to NS, if si > dl , 
or SN, if si < dl. 

2. If the next e-cube hop is not blocked 
by an oversized node, then set the 
status of M to normal and set the 
direction of M to null. 

3. Otherwise, set the status of M by 
Set-Direction (M) . 

4. If the direction of M is null, then 
use its x-y hop, 

5. Otherwise, route M on the oversized 
node according to the specified 
direction. 

Procedure Set-Direction (M) 

0. If M is a column message and its 
direction is null, then set (11, 10) = 

(si, sO) . 

1. If the direction of M ^ null and the 
current node is an end node then reverse 
the direction of M and return. 

2. If M is a column message and sO ^10, 
then return. 

3. If M is a column message and si ^11, 
sO = 10, then set its direction to null. 

4. If the next e-cube hop of M is not 
faulty, set its direction to null and 
return. 

5. If direction of M is not null, then 
return. 

6. If M is a WE message, set its 
direction to 

6.1 clockwise if si < dl , or 

6.2 counter-clockwise if si > dl , or 

6.3 either direction if si = dl . 

7. If M is an EW message, set its 
direction to 

7.1 clockwise if si > dl , or 

7.2 counter-clockwise if si < dl , or 

7.3 either direction if si = dl . 

8. If M is an NS message, set its 
direction to clockwise, if the current 
node is not located on the EAST boundary 
of 2D meshes, or counter-clockwise, 
otherwise, and set (11 , 10) = (si , 

sO) . 

9. If M is an SN message, set its 
direction to counter-clockwise, if the 
current node is not located on the EAST 
boundary of 2D meshes, or clockwise, 
otherwise, and set (11 , 10) = (si , 
sO) . 
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D. Deadlock- and Live lock-Freeness 
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III. Results and Discussions 



A WE message can travel from north to south or south to 
north, if its next e-cube hop is an oversized node. A north-to- 
south (south-to-north) WE message can take south-to-north 
(north-to- south) hops only if it encounters an end node and 
takes an u-turn at the end node. No deadlock occurs among EW 
messages can be assured by similar statements. NS messages 
can travel from north to south but not from south to north; there 
can't be a deadlock between NS messages waiting in different 
rows. NS messages are designed to get around the oversized 
components in a counterclockwise direction. An NS message 
can take an u-turn at an end node on the west boundary of 2-D 
meshes and change its direction to be clockwise, but can't take 
an u-turn at the east boundary of 2-D meshes, since no entire 
row of out-of-order components is allowed. Thus, no deadlock 
can occur between NS messages waiting on the same row. No 
deadlock can occur among SN messages that are assured by 
similar statements. Since the number of oversized nodes and 
broken links is finite and message never visits an oversized 
node more than once, our routing scheme is also live lock- free. 



In this section, we describe how we perform the simulation 
and obtain results from simulator. Moreover, we show the 
improvements of the primitive algorithms by our modification. 
In order to model the interconnection network, an object- 
oriented simulator was developed base on [10], [1 1]. 

Some parameters we have considered are an average 
number of switched messages (ANSM) and average number of 
routed messages (ANRM) in each period of time. The other 
examined parameter in this paper is the utilization of the 
network which is using our routing algorithm, i-xy. Utilization 
illustrates the number of flits in each cycle, which passed from 
one node to another, in any link over bandwidth. Bandwidth is 
defined as the maximum number of flits could be transferred 
across the normal links in a cycle of the network. We have 
examined utilization over message injection rate (MIR) and 
average message delay (AMD) over utilization for all sets of 
cases. The last parameter we have considered is the average 
number of blocked messages (ANBM) in the network. 
Simulation methodology describes in the rest of this section. 
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Figure 2. Utilization of i-xy, e-xy, and OAPR routing algorithms for 5% and 
10% ONs by 32 flits packets a) Uniform traffic b) Hotspot traffic. 



A. Simulation Methodology 

A flit-level simulator has been designed. We record average 
message latencies, utilization and some other parameters 
measured in the network with the time unit equal to the 
transmission time of a single flit, i.e. one clock cycle. Our 
study is performed for different rates: 5%, and 10% of 
oversized nodes. Our generation interval has exponential 
distribution which leads to Poisson distribution of number of 
generated messages per a specified interval. In our simulation 
studies, we assume message length to be equal to 32 flits and 
we use an 8 x 8 2-dimensional irregular mesh network, and it 
takes one cycle to transfer a flit on a physical channel. Two 
different traffic patterns are simulated: 

• Uniform traffic: The source node sends messages to 
any other node with equal probability. 

• Hotspot traffic: Messages are destined to a specific 
node with a certain probability and are otherwise 
uniformly distributed. 

The number of messages generated for each simulation 
result, depends on the traffic distribution, and is between 
1,000,000 to 3,000,000 messages. The simulator has three 
phases: start-up, steady-state, and termination. The start-up 
phase is used to ensure the network is in steady-state before 
measuring message latency. So, we do not gather the statistics 
for the first 10% of generated messages. All measures are 
obtained from the remaining of messages generated in steady- 
state phase. Messages generated during the termination phase 
are also not included in the results. The termination phase 
continues until all the messages generated during second phase 
have been delivered [10], [11]. In the rest of this section we 
study the effect of using predefined odd/even row and columns 
on the performance of i-xy. We perform this analysis under a 
different traffic distribution pattern. It is noted that only parts of 
simulation results are presented in this paper. 
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Figures 2, 3, 4, 5, and 6 show the simulation results for two 
different oversized node cases, 5 percent and 10 percent, with 
uniform and hotspot (p=10%) traffic. 

B. Comparison ofi-xy, e-xy, and OAPR Routing Algorithms 

Uniform traffic is the most used traffic model in the 
performance analysis of interconnection networks [10], [11]. 
Fig. 2a, 3a, 4a, 5a, and 6a displays the effect of the 
improvement on the performance of i-xy, e-xy, and OAPR 
routing algorithms in 2-D irregular mesh interconnection 
network for this traffic pattern. 

In order to generate hotspot traffic we used a model 
proposed in [10]. According to this model each node first 
generates a random number. If it is less than a predefined 
threshold, the message is sent to the hotspot node. Otherwise, it 
is sent to other nodes of the network with a uniform 
distribution. 



(IJCSIS) InternationalJournal of Computer Science and Information Security, 

Vol. 9, No. 2, 2011 
hotspot node, and finally averaged. Hotspot rate is also 
considered in our study, namely 10%. Fig. 2b, 3b, 4b, 5b, and 
6b illustrates the effect of the performance of every three above 
mentioned routing algorithms for hotspot traffic distribution 
pattern. 
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Figure 3. Performance ofi-xy, e-xy, and OAPR routing algorithms for 5% 
and 10% ONs by 32 flits packets a) Uniform traffic b) Hotspot traffic. 

As the mesh interconnection network is not a symmetric 
network, we have considered two types of simulation for 
hotspot traffic in this network. In one group of simulations, a 
corner node is selected as the hotspot node and in the other 
group; a node in the middle of the network is chosen as the 



We defined utilization as the major performance metric. For 
an interconnect network, the system designer will specify a 
utilization requirement. Fig. 2a and 2b shows the utilization 
over the message injection rate for two cases of oversized 
nodes with two different traffic patterns, uniform and hotspot 
traffic, on 8 x 8 irregular 2-dimensional mesh Network-on- 
Chip. As we can see, the network which uses extended-xy and 
OAPR algorithm is saturated with low MIR while the 
improved-xy algorithm has a higher saturation point. As an 
example in 10% case of extended-xy, the utilization for 0.0033 
MIR is lower 16.67% and for OAPR is 25%, yet the other 
algorithm, improved-xy, works normally even for 0.0067 MIR 
with 33.8% utilization at 100% traffic load (fig. 2a). In fact our 
irregular routing algorithm has higher utilization. Additionally, 
improvement can be found in other traffic pattern. 
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Figure 4. ANBM ofi-xy, e-xy, and OAPR routing algorithms for 5% and 
10% ONs by 32 flits packets a) Uniform traffic b) Hotspot traffic. 

The most important comparison we have done between 
these three algorithms is the rate of average message delay over 
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utilization. Comparative performance across different cases in 
fig. 3a and fig. 3b is specific to the several oversized node sets 
used. For each case, we have simulated previous sets up to 
100% traffic load. 

54 



45 



3:5 
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of a communication system have more free buffers, messages 
may deliver simply across the interconnection network. 
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Figure 5. ANSM of i-xy, e-xy, and OAPR routing algorithms for 5% and 
10% ONs by 32 flits packets a) Uniform traffic b) Hotspot traffic. 

As an example, we consider the amount of average message 
delay for both algorithms with 16% utilization in 5% mode in 
uniform traffic (fig. 3a). At this point, the network which uses 
e-xy has more than 183 AMD at 100% traffic load and the 
network uses OAPR has more than 38 AMD, while the other 
network using i-xy, has less than 24 AMD, and it has not been 
saturated. Comparing the utilization of these algorithms for 
100% traffic load, it is obvious the network using i-xy has 
32.5% utilization, whereas the OAPR has 27.86% and the other 
one has just 16.67% utilization. We have improved utilization 
of network more than 16% by our proposed algorithm at 100% 
traffic load compared to OAPR for this case, and about twice 
for extended-xy. Other case is also considerable. 

The next parameter we have examined is the average 
number of blocked messages (ANBM) in each cycle which 
illustrates average number of blocked messages in the network 
because no buffer is available to pass to the next node. If nodes 



As it is shown in fig. 4a and 4b a fraction of delays which 
messages are encountered by, is the delay of waiting for an 
empty buffer for the next hop. For instance, comparing three 
algorithms in fig. 4b for 10% mode by hotspot traffic in 0.0025 
MIR, it is clear that when a network uses e-xy, over 3.35 
messages blocked in every cycle and this number for OAPR at 
this point is more than 0.99 messages, but by using i-xy 
algorithm, less that 0.55 messages blocked in every cycle. This 
condition is repeated for the other case shown in fig. 4a for 
uniform traffic which is substantial. 

Fig. 5 shows the average number of switched messages 
(ANSM) in each cycle over the message injection rate (MIR) 
for all cases. It is clear; the network uses i-xy algorithm has 
minor improvement at 100% traffic load compared to the other 
two above mentioned algorithms. As an example in fig. 5a in 
10%) mode of extended-xy and OAPR, the ANSM at saturation 
point is about 36.5, yet ANSM for the other algorithm, 
improved-xy is more than 37. In fact our irregular routing 
algorithm has similar behavior for this parameter. But, this 
parameter for hotspot traffic distribution has better condition. 
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Figure 6. ANRM of i-xy, e-xy, and OAPR routing algorithms for 5% and 
10% ONs by 32 flits packets a) Uniform traffic b) Hotspot traffic. 
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The last parameter we consider is the average number of 
routed messages (ANRM) in each cycle. As it is shown in fig. 
6a and fig. 6b, the ANRM for improved-xy has higher in 
comparison to extended-xy and OAPR algorithms. For 
instance, in fig. 6b in hotspot traffic by 5% mode of extended- 
xy, the ANRM for 0.003 MIR is 1.12 messages and the 
network saturated at 0.0055. The network uses OAPR 
algorithm (at saturation point) has 2.05 ANRM, but this 
number for improved-xy algorithm is more than 2.37 in 0.007 
MIR at saturation point. Also, enhancement can be found by 
using uniform traffic in fig. 6a. 

IV. Conclusion 

Designing a deadlock-free routing algorithm that can 
tolerate unlimited number of oversized nodes is not an easy 
job. Oversized blocks are expanded, by disabling good nodes, 
to be rectangular shapes in existing literature to facilitate the 
designing of deadlock- free routing algorithms for 2-D irregular 
mesh networks. The simulation results show the improvement 
of network utilization (from 35% to 100%), which are needed 
to work with rectangular oversized nodes, can be recovered if 
the number of original oversized nodes is less than 10% of the 
total network. 

We have been simulated every three algorithms for the 
same message injection rates, oversized node situations, 
message lengths, network size, and the percentage of oversized 
nodes and in many cases our studies have better results in 
comparison with the other two algorithms. 

We also showed that in various traffics and different 
number of oversized nodes, these oversized blocks can be 
handled. The deterministic algorithm is enhanced from the non- 
adaptive counterpart by utilizing the way of passing oversized 
nodes by the proposed algorithm when a message is blocked. 
The method we used for enhancing the extended-xy and OAPR 
algorithms is simple, easy and its principle is similar to the 
previous algorithm, if-cube2. Moreover, ANBM and ANRM 
are improved by our proposed algorithm. In conclusion 
improved-xy has better performance compared to extended-xy 
and OAPR and is feasible for Network-on-Chip. 
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in Mobile and Ubiquitous Networks, Security of GSM/GPRS/UMTS Systems, Sensor Networks Security, 
Vehicular Network Security, Wireless Communication Security: Bluetooth, NFC, WiFi, WiMAX, 
WiMedia, others 



This Track will emphasize the design, implementation, management and applications of computer 
communications, networks and services. Topics of mostly theoretical nature are also welcome, provided 
there is clear practical potential in applying the results of such work. 

Track B: Computer Science 

Broadband wireless technologies: LTE, WiMAX, WiRAN, HSDPA, HSUPA, Resource allocation and 
interference management, Quality of service and scheduling methods, Capacity planning and dimensioning, 
Cross-layer design and Physical layer based issue, Interworking architecture and interoperability, Relay 
assisted and cooperative communications, Location and provisioning and mobility management, Call 
admission and flow/congestion control, Performance optimization, Channel capacity modeling and analysis, 
Middleware Issues: Event-based, publish/subscribe, and message-oriented middleware, Reconfigurable, 
adaptable, and reflective middleware approaches, Middleware solutions for reliability, fault tolerance, and 
quality-of-service, Scalability of middleware, Context-aware middleware, Autonomic and self-managing 
middleware, Evaluation techniques for middleware solutions, Formal methods and tools for designing, 
verifying, and evaluating, middleware, Software engineering techniques for middleware, Service oriented 
middleware, Agent-based middleware, Security middleware, Network Applications: Network-based 
automation, Cloud applications, Ubiquitous and pervasive applications, Collaborative applications, RFID 
and sensor network applications, Mobile applications, Smart home applications, Infrastructure monitoring 
and control applications, Remote health monitoring, GPS and location-based applications, Networked 
vehicles applications, Alert applications, Embeded Computer System, Advanced Control Systems, and 
Intelligent Control : Advanced control and measurement, computer and microprocessor-based control, 
signal processing, estimation and identification techniques, application specific IC's, nonlinear and 
adaptive control, optimal and robot control, intelligent control, evolutionary computing, and intelligent 
systems, instrumentation subject to critical conditions, automotive, marine and aero-space control and all 
other control applications, Intelligent Control System, Wiring/Wireless Sensor, Signal Control System. 
Sensors, Actuators and Systems Integration : Intelligent sensors and actuators, multisensor fusion, sensor 
array and multi-channel processing, micro/nano technology, microsensors and microactuators, 
instrumentation electronics, MEMS and system integration, wireless sensor, Network Sensor, Hybrid 



Sensor, Distributed Sensor Networks. Signal and Image Processing : Digital signal processing theory, 
methods, DSP implementation, speech processing, image and multidimensional signal processing, Image 
analysis and processing, Image and Multimedia applications, Real-time multimedia signal processing, 
Computer vision, Emerging signal processing areas, Remote Sensing, Signal processing in education. 
Industrial Informatics: Industrial applications of neural networks, fuzzy algorithms, Neuro-Fuzzy 
application, biolnformatics, real-time computer control, real-time information systems, human-machine 
interfaces, CAD/CAM/CAT/CIM, virtual reality, industrial communications, flexible manufacturing 
systems, industrial automated process, Data Storage Management, Harddisk control, Supply Chain 
Management, Logistics applications, Power plant automation, Drives automation. Information Technology, 
Management of Information System : Management information systems, Information Management, 
Nursing information management, Information System, Information Technology and their application, Data 
retrieval, Data Base Management, Decision analysis methods, Information processing, Operations research, 
E-Business, E-Commerce, E-Government, Computer Business, Security and risk management, Medical 
imaging, Biotechnology, Bio-Medicine, Computer-based information systems in health care, Changing 
Access to Patient Information, Healthcare Management Information Technology. 
Communication/Computer Network, Transportation Application : On-board diagnostics, Active safety 
systems, Communication systems, Wireless technology, Communication application, Navigation and 
Guidance, Vision-based applications, Speech interface, Sensor fusion, Networking theory and technologies, 
Transportation information, Autonomous vehicle, Vehicle application of affective computing, Advance 
Computing technology and their application : Broadband and intelligent networks, Data Mining, Data 
fusion, Computational intelligence, Information and data security, Information indexing and retrieval, 
Information processing, Information systems and applications, Internet applications and performances, 
Knowledge based systems, Knowledge management, Software Engineering, Decision making, Mobile 
networks and services, Network management and services, Neural Network, Fuzzy logics, Neuro-Fuzzy, 
Expert approaches, Innovation Technology and Management : Innovation and product development, 
Emerging advances in business and its applications, Creativity in Internet management and retailing, B2B 
and B2C management, Electronic transceiver device for Retail Marketing Industries, Facilities planning 
and management, Innovative pervasive computing applications, Programming paradigms for pervasive 
systems, Software evolution and maintenance in pervasive systems, Middleware services and agent 
technologies, Adaptive, autonomic and context-aware computing, Mobile/Wireless computing systems and 
services in pervasive computing, Energy-efficient and green pervasive computing, Communication 
architectures for pervasive computing, Ad hoc networks for pervasive communications, Pervasive 
opportunistic communications and applications, Enabling technologies for pervasive systems (e.g., wireless 
BAN, PAN), Positioning and tracking technologies, Sensors and RFID in pervasive systems, Multimodal 
sensing and context for pervasive applications, Pervasive sensing, perception and semantic interpretation, 
Smart devices and intelligent environments, Trust, security and privacy issues in pervasive systems, User 
interfaces and interaction models, Virtual immersive communications, Wearable computers, Standards and 
interfaces for pervasive computing environments, Social and economic models for pervasive systems, 
Active and Programmable Networks, Ad Hoc & Sensor Network, Congestion and/or Flow Control, Content 
Distribution, Grid Networking, High-speed Network Architectures, Internet Services and Applications, 
Optical Networks, Mobile and Wireless Networks, Network Modeling and Simulation, Multicast, 
Multimedia Communications, Network Control and Management, Network Protocols, Network 
Performance, Network Measurement, Peer to Peer and Overlay Networks, Quality of Service and Quality 
of Experience, Ubiquitous Networks, Crosscutting Themes - Internet Technologies, Infrastructure, 
Services and Applications; Open Source Tools, Open Models and Architectures; Security, Privacy and 
Trust; Navigation Systems, Location Based Services; Social Networks and Online Communities; ICT 
Convergence, Digital Economy and Digital Divide, Neural Networks, Pattern Recognition, Computer 
Vision, Advanced Computing Architectures and New Programming Models, Visualization and Virtual 
Reality as Applied to Computational Science, Computer Architecture and Embedded Systems, Technology 
in Education, Theoretical Computer Science, Computing Ethics, Computing Practices & Applications 
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